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Abstract 

A  nonlinear  regression  model  is  proposed  as  an  alternative  to  the 
Box-Cox  regression  model  for  nonnegative  variables.   The  functional  form 
contains  as  special  cases  the  linear,  exponential,  constant  elasticity,  and 
generalized  CES  specifications,  as  well  as  other  functional  forms  used  by 
applied  econometricians .   The  model  can  be  derived  from  but  is  more  general 
than  a  particular  modification  of  the  Box-Cox  model.   Because  the  model  is 
specified  directly  in  terms  of  E(y|x),  the  parameters  are  easy  to  interpet 
and  economic  quantities  are  straightforward  to  compute.   Unlike  Box-Cox  type 
approaches,  the  proposed  weighted  nonlinear  least  squares  estimators  of  the 
conditional  mean  function  are  robust  to  conditional  variance  and  other 
distributional  misspecif ications ;  in  some  leading  cases  they  are  also 
asymptotically  efficient.   Computationally  simple,  robust  lagrange  multiplier 
statistics  for  various  restricted  versions  of  the  model  are  derived.   The 
explained  variable  can  be  continuous,  discrete,  or  some  combination  of  the 
two.   A  method  for  obtaining  scale- invariant  t-statistics  is  also  discussed, 
while  the  lagrange  multiplier  test  for  exclusion  restrictions  is  shown  to  be 
scale  invariant. 


1.  Introduction 

Economists  and  other  social  scientists  are  often  interested  in 
explaining  a  nonnegative  variable  y  in  terms  of  some  explanatory  variables  x 
s  (x  ,x„ , . . . ,x  ) .   For  many  purposes  this  involves  specifying  and  estimating 
a  model  for  the  conditional  expectation  of  y  given  x.   The  first  model 
encountered  in  econometrics  courses,  which  postulates  that  E(y|x)  is  a  linear 
function  of  x,  or  a  linear  function  of  <^(x)  for  some  vector  function  4> ,    often 
provides  an  inadequate  description  of  E(y|x).   In  addition,  the  assumption  of 
homoskedasticity  for  y  is  frequently  violated  by  the  data,  resulting  in  the 
usual  inference  procedures  being  inappropriate.   Finally,  although  of  less 
importance  for  asymptotic  inference,  the  classical  assumption  that  y 
conditional  on  x  is  normally  distributed  is  untenable  because  y  is 
nonnegative . 

In  econometrics  the  most  common  alternative  to  the  linear  model  for 
E(y|x)  is  a  linear  model  for  E(log  y|x),  provided  that  P(y  >  0)  =  1. 
Normality  of  log  y  cannot  be  ruled  out  a  priori   and  heteroskedasticity  is 
often  less  of  a  problem  in  linear  models  with  log  y  as  the  dependent 
variable.   However,  the  important  issue  of  whether  the  linear  model  for  log  y 
implicitly  provides  the  best  description  of  E(y|x)  depends  on  the  particular 
application.   This  is  in  no  way  guaranteed  even  if  the  distribution  of  log  y 
given  X  is  normal   with  constant  variance;  one  cannot  even  investigate  this 
issue  unless  the  estimates  of  E(log  y|x)  can  be  transformed  into  estimates  of 
E(y|x). 

Noting  that  the  identity  and  logarithmic  transformations  in  linear 
models  are  too  restrictive  for  all  statistical  applications,  Box  and  Cox 
(1964)  suggested  a  by  now  well-known  transformation  of  y  that  contains  the 


identity  and  logarithmic  transformations  as  special  cases.   For  nonnegative 
y,  the  Box-Cox  transformation  is  defined  as 

(1.1)  y(A)  -  (y^  -  l)/A,  A  ^  0 

(1.2)  -  log  y,       A  =  0. 

The  case  A  <  0  is  allowed  only  if  P(y  >  0)  =  1. 

In  the  Box-Cox  regression  model  there  is  a  value  A  G  R  such  that  for 

2 
some  Kxl  vector  /3  and  some  a   >  0, 

(1.3)  y(A)|x  -  NUp,a^) 

(see  also  Spitzer  (1982)  and  Hinkley  and  Runger  (1984)).   It  is  well  known 
that  (1.3)  cannot  strictly  be  true  unless  A  =  0;  (1.3)  should  be  interpreted 

only  as  an  approximation.   The  inconsistency  of  the  quasi-MLE's  (QMLE's)  of 

2 
A,  p,    and  a      due  to  the  inherent  nonnormality  is  well -documented  (see,  for 

example,  Amemiya  and  Powell  (1981)).   Also,  the  practice  of  estimating  A  and 

then  performing  inference  on  /3  as  if  A  were  known  can  be  misleading  and  has 

been  criticized  by  various  statisticians  and  econometricians  (see,  for 

example,  Amemiya  and  Powell  (1981),  Bickel  and  Doksum  (1981),  and  Cohen  and 

Sackrowitz  (1987)). 

From  a  social  scientist's  point  of  view  there  is  the  more  important 

problem  of  interpreting  the  parameters  P   and  A.   The  vector  ^  measures  the 

marginal  effects  of  the  explanatory  variables  on  E[y(A)lx].   But  rarely  is 

the  variable  to  be  explained  in  economic  studies  defined  arbitrarily;  the 

fact  that  y  appears  at  all  suggests  that  there  is  a  natural  measure  of  the 

phenomenon  of  interest.   If  y  is  the  variable  that  is  important  to  economic 

agents  and/or  policy  makers  then  interest  typically  lies  in  the  conditional 

expectation  of  y  given  the  explanatory  variables.   The  parameters  fi.    A,  and 

2 
o      in  a  Box-Cox  model  are  of  interest  only  because  they  also  parameterize  the 


conditional  expectation  E(y|x).   Poirier  and  Melino  (1978)  derive  the 
relationship  between  /3  and  E(y|x)  when  y(A)  is  assumed  to  have  a  plausible 
truncated  normal  distribution.   They  show  that  fi.    and  3E(y|x)/ax.  have  the 
same  sign  but  are  not  equal.   But  the  expression  for  E(y|x)  depends  crucially 
on  the  assumed  distribution  for  y(A) .   In  the  original  Box-Cox  model  the 
resulting  estimates  of  E(y|x)  are  inconsistent  if  in  fact  there  is  no  A  that 
simultaneously  induces  linearity  of  the  conditional  expectation, 
homoskedasticity ,  and  normality.   This  is  a  potentially  serious  deficiency  of 
Box-Cox  type  procedures  since  marginal  effects,  elasticities,  and  predicted 
values  of  y  are  of  primary  interest  in  econometric  studies. 

This  paper  offers  an  alternative  to  the  Box-Cox  regression  model  by 
specifying  a  functional  form  for  E(y|x)  that  is  more  flexible  than  simply 
using  y  or  log  y  as  the  dependent  variable  in  a  linear  model.   The  functional 
form  analyzed  here  generalizes  those  used  by  others  in  the  literature  on 
nonlinear  estimation  (e.g.  Muker j i  (1963),  Mizon  (1977),  Berndt  and  Khaled 
(1979)),  and  provides  a  unified  framework  for  analyzing  and  testing  many  of 
the  regression  functions  used  in  applied  economics.   As  shown  in  section  3, 
it  is  as  flexible  as  the  Box-Cox  transformation  for  modelling  E(y|x)  but  -- 
in  contrast  to  Box-Cox  type  approaches  --  tests  about  E(y|x)  can  be 
carried  out  without  imposing  auxiliary  distributional  assumptions.   The 
current  approach  is  inherently  more  robust  than  models  specified  in  terms  of 
a  nonlinear  transformation  of  y.   (For  a  recent  example  of  the  latter 
approach,  see  MacKinnon  and  McGee  (1989)). 

The  motivation  underlying  this  paper  combines  my  belief  that  E(y|x)  and 
functionals  of  E(y|x)  are  the  objects  of  primary  interest  with  the  following 
observation  made  by  Judge  et.  al .  (1985)  in  their  treatment  of  the  Box-Cox 


transformation: 

Despite  the  fact  that  this  transformation  may  be  useful  for  inducing 
normality  on  observations  from  skewed  distributions,  and  despite  the  fact 
that  this  section  appears  in  a  chapter  entitled  "Nonnormal  Disturbances,"  the 
main  use  of  the  Box-Cox  transformation  in  empirical  econometrics  has  been  as 
a  device  for  generalizing  functional  form.   (p. 840) 

Rather  than  searching  for  a  (possibly  nonexistent)  transformation  of  the 
explained  variable  that  simultaneously  induces  approximate  normality, 
homoskedasticity ,  and  linearity  of  the  conditional  expectation,  this  paper 
attempts  the  more  modest  task  of  specifying  a  functional  form  for  E(y|x)  that 
contains  the  linear,  exponential,  constant  elasticity  and  a  variety  of  other 
regression  models  as  special  cases.   I  do  not  worry  about  finding  a 
transformation  of  the  explained  variable  that  is  normally  distributed  and 
homoskedastic ,  as  these  features  are  not  of  primary  importance  for  testing 
hypotheses  in  the  social  sciences.   The  parameters  of  the  conditional  mean 
specification  are  easy  to  interpret  and  the  weighted  nonlinear  least  squares 
estimators  proposed  below  are  likely  to  be  sufficiently  precise  in  many 
applications.   In  some  cases  the  WNLS  estimators  are  fully  efficient.   This 
notwithstanding,  my  view  is  that  obtaining  robust,  possibly  inefficient 
estimates  of  economically  interesting  parameters  is  preferred  to  obtaining 
efficient  (under  correct  specification  of  the  distribution)  but  nonrobust 
estimates  of  parameters  that  are  difficult  to  interpret. 

Section  2  of  the  paper  briefly  presents  a  case  for  defining  all  economic 
quantities  in  terms  of  E(y|x),  where  y  is  the  economic  variable  to  be 
explained.   Section  3  discusses  the  basic  model  for  E(y|x),  describes  how  it 
is  obtainable  from  a  modified  version  of  the  Box-Cox  model,  and  derives  the 
asymptotic  covariance  matrix  of  the  weighted  nonlinear  least  squares 
estimator.   Section  4  derives  simple  lagrange  multiplier  (LM)  tests  for 


exclusion  restrictions  and  for  the  linear  and  exponential  special  cases;  both 
standard  LM  tests  and  LM  tests  that  are  robust  to  to  conditional  variance 
misspecif ication  are  covered.   Section  5  extends  the  model  to  allow  for 
Box-Cox  transformations  of  some  of  the  explanatory  variables  and  discusses 
testing  in  the  more  general  model.   The  important  issue  of  obtaining  scale 
invariant  test  statistics  is  treated  in  section  6.   Some  practical 
considerations  are  discussed  in  section  7,  and  section  8  contains  concluding 
remarks . 

2.  Some  Considerations  when  Choosing  Functional  Form 

Transformations  of  the  explained  and  explanatory  variables  are  used 
quite  liberally  in  the  social  sciences,  often  without  regard  for  the 
implications  for  interpreting  parameter  estimates.   The  most  common 
transformation  for  positive  variables  is  the  logarithmic  transformation.   If 
y  and  x  are  positive  random  scalars,  a  popular  model  is 

(2.1)  E(log  y|x)  =  Oq  +  Q^log  X. 
It  follows  from  (2.1)  that 

(2.2)  Q^  =  aE(log  y|x)/31og  x, 

and  the  coefficient  a      is  usually  interpreted  as  the  elasticity  of  y  with 

2 
respect  to  x.   If  log  y|x  -  ^(Q.+a  log(x) , a  )  then 

2 
E(y|x)  =  exp[aQ  +  a  /2  +  Q^log(x)] 


and  so 


(2.3)  Q^  =  aiog  E(y|x)/51og  x. 

Thus,  if  log  y  conditional  on  x  satisfies  the  assumptions  of  the  classical 
linear  model  then  it  makes  no  difference  whether  one  defines  the  elasticity 


of  y  with  respect  to  x  by  3E(log  y|x)/31og  x  or  by  aiog  E(y|x)/31og  x.   It  is 
not,  however,  difficult  to  construct  examples  where  these  quantities  are  not 
the  same.   If  in  (2.1)  log  y|x  ~  N(q  +a  log(x)  ,  26  _  +  25  x)  for  some  5^,  5..  >  0 
then  E(y|x)  =  exp[(a  +5  )  +  q  log(x)  +  6 ^x]    and 

aiog  E(y|x)/31og  X  =  Q^  +  S^x, 

which  is  always  greater  than  aE(log  y|x)/ax  =  q    Although  this  example  is 
somewhat  contrived,  it  is  not  implausible,  and  it  does  illustrate  the 
importance  of  developing  a  unified  framework  in  which  to  define  economic 
quantities.   The  definitions  should  be  as  model-free  as  possible  and  the 
various  relationships  that  hold,  say,  between  derivatives  and  elasticities 
when  a  relationship  is  deterministic,  should  carry  over  to  the  stochastic 
case . 

If  y  and  x  are  scalars  related  by 

y  =  f(x), 

for  a  differentiable  function  f ,  then  the  marginal  effect  of  x  on  y  is  simply 
af(x)/3x,  while  the  elasticity  of  y  with  respect  to  x  is 

(2.4)  rj  ^  J^^.J^  f(x)  ^  0. 

y,x     dx        f(x) 

If  X,  y  >  0,  the  elasticity  can  also  be  expressed  as 

^  aiog  f(x) 

y,x    aiog  X 
When  y  and  x  are  random  variables  the  natural  definition  of  the  marginal 
effect  of  X  on  y  is  the  marginal  effect  of  x  on  the  expected  value  of  y  given 
X,  aE(y|x)/ax.   The  advantage  of  defining  all  quantities  in  terms  of  E(y|x) 
is  that,  for  example,  it  preserves  the  well-known  relationship  (2.4)  that 
holds  between  marginal  effects  and  elasticities  in  the  deterministic  case. 
From  the  simple  example  above  this  relationship  is  not  preserved  if  the 


elasticity  is  defined  in  terms  of  E(log  y|x),  or  in  terms  of  some  expectation 

other  than  E(y|x)  (of  course  scaling  y  by  a  nonzero  constant  does  not  change 

anything) .   It  is  also  straightforward  to  show  that  the  deterministic 

relationships  that  hold  between  other  economic  quantities  (e.g.  elasticities 

and  semi-elasticities)  are  preserved  if  they  are  all  defined  in  terms  of 

E(y|x). 

This  discussion  extends  immediately  to  the  case  of  many  explanatory 

variables.   If  x  =  (x  , . . . ,x  )  is  a  set  of  K  explanatory  variables  then  the 

marginal  effect  of  (say)  x  on  E(y|x),  holding  x   . . . ,x    constant,  is 

K.  J.       K  - 1 

simply 

3E(y|x)/aXj^. 
The  elasticity  of  y  with  respect  to  x   holding  x..  ,  .  .  .  ,x     constant,  is 

^  ^  3E(y|x)    "^K 

^'^k'^^I ""k-I     ^""k   E(y|x)' 

while  the  percentage  change  in  E(y|x)  when  x   is  increased  by  one  unit  is 
measured  as 

3E(y|x)    1 
^""k   E(y|x) 

These  measures  are  almost  but  not  quite  model  independent.   The  elasticity  of 
y  with  respect  to  x   generally  changes  as  the  list  of  explanatory  variables 
changes.   This  is  always  the  case  in  regression- type  analyses  and  cannot  be 
avoided,  even  in  fully  nonparametric  settings. 

Another  benefit  of  defining  economic  quantities  in  terms  of  E(y|x)  is 
that  it  circumvents  the  issue  of  whether  the  "disturbances"  in  a  model  for 
nonnegative  y  are  additive  or  multiplicative.   When  y  >  0  and  interest 
centers  on  /u(x)  =  E(y|x),  the  disturbances  can  be  multiplicative  or  additive: 


the  model 

(2.5)  y  =  m(x)  +  €,   E(€|x)  =  0 
is  observationally  equivalent  to  the  model 

(2.6)  y  =  /i(x)e,   e  >  0,  E(e|x)  =  1. 

(Simply  define  €  =  y  -  ^(y|x)  and  e  =  y//i(x)  if  P[/x(x)  >  0]  =  1.)   If  e  and  e 
are  assumed  to  be  independent  of  x,  then  the  models  do  differ  in  the 
conditional  second  moment  properties  of  y:   model  (2.5)  corresponds  to  the 
assumption  that  V(y|x)  is  constant  while  (2.6)  implies  that  V(log  y|x)  is 
constant.   Although  distinguishing  between  the  two  variance  assumptions  might 
be  important  for  efficiency  reasons,  it  is  not  necessary  for  estimating 
E(y|x)  or  for  obtaining  hypotheses  tests  about  E(y|x)  with  known  asymptotic 
size  . 

In  summary,  the  point  of  this  section  is  that  all  economic  quantities 
should  be  defined  in  terms  of  E(y|x)  once  the  list  of  explanatory  variables 
has  been  specified.   This  avoids  the  arbitrariness  that  arises  if  various 
transformations  of  the  explained  variable  are  entertained,  and  is  a  natural 
extension  from  the  deterministic  case.   (One  could  argue  that  using  the 
conditional  median  of  y  given  x  is  another  natural  extension,  but  addressing 
this  issue  is  beyond  the  scope  of  this  paper.)   The  variable  to  be  explained 
in  most  econometric  studies  is  rarely  defined  arbitrarily  (except  possibly 
for  the  units  of  measurement,  which  only  affects  the  scaling),  so  that 
estimates  of  E[(p(y)|x]  for  some  nonlinear  transformation  (p(y)  are  useful  only 
if  enough  structure  has  been  imposed  to  recover  E(y|x).   Adopting  the 
measures  suggested  in  this  section  imposes  a  uniformity  across  researchers: 
the  definition  is  common  to  all,  even  though  different  researchers  might  use 


different  functional  forms  for  E(y|x)  and  different  methods  of  estimating 
E(y|x).   The  common  definition  facilitates  goodness  of  fit  comparisions . 

These  points  are  certainly  not  unique  to  this  paper;  similar 
observations  have  been  made  in  the  context  of  specific  models  by  Goldberger 
(1968),  Poirier  (1978),  Poirier  and  Melino  (1978),  Huang  and  Kelingos  (1979), 
Huang  and  Grawe  (1980),  and  others.   But  the  current  paper  is  motivated  by 
the  observation  that,  in  a  parametric  framework,  the  only  way  to  estimate 
E(y|x)  consistently  without  imposing  additional  distributional  assumptions  is 
to  specify  a  model  for  E(y|x)  directly. 

3.  Specification  and  Estimation  of  the  Basic  Model 

Let  y  be  a  nonnegative  random  variable  and  let  x  =  (x  ,x„ x  )  be  a 

1   z       ix 

IxK  vector  of  explanatory  variables.   Typically  the  first  element  x   is 
unity.   Without  any  assumptions  on  the  conditional  distribution  of  y  given  x 
(except  that  its  support  is  contained  in  [0,«)),  consider  the  following  model 
for  E(y|x) :   i 

(3.1)  ?  E(y|x)  =  [1  +  Xk0]'^^^,  X   9^  0 

(3.2)  t        =  exp(x/3),         A  =  0. 

Technically,  it  would  be  more  precise  to  replace  (/3,A)  in  (3.1)  and  (3.2)  by 
something  such"  as  (/3  ,A  )  to  distinguish  the  "true"  parameters  from  the 
generic  parameter  vector  (/3,A).   As  this  results  in  a  notational  nightmare, 
(/3,A)  is  used  to  denote  the  true  values  as  well  as  generic  values.   It  should 
be  clear  from  the  context  which  is  intended.   As  usual,  the  vector  x  can 
contain  nonlinear  transformations  of  an  underlying  set  of  explanatory 
variables  (in  contrast  to  y,  which  should  be  the  economic  quantity  of 
interest) . 


For  (3.1)  to  be  well-defined  the  inequality 

(3.3)  1  +  Ax/3  >  0    (with  strict  inequality  for  A  <  0) 

must  hold  for  all  relevant  values  of  x.   This  is  analogous  to  requiring 
nonnegativity  of  the  regression  function  in  a  Box-Cox  model  when  A  ?«  0 . 
Equation  (3.2)  is  the  natural  definition  of  the  regression  function  at  A  =  0 
as  it  is  the  limiting  case  of  (3.1): 

lim  [1-1-  Ax/3]"'"^'*'  =  exp(x^).    ■:■;-.   .   .  ,  .    - 
A->0 

Incidentally,  y  need  not  be  continuously  distributed  on  [0,<»)  for  (3.1)  and 

(3.2)  to  make  sense;  for  example,  y  could  be  a  count  variable. 

When  A  =  1  (3.1)  reduces  to  a  linear  model  for  E(y|x).   The  exponential 

regression  model  (3.2)  is  particularly  appealing  for  nonnegative  explained 

variables  because  it  ensures  that  the  predicted  values  are  well-defined  and 

positive  for  all  x  and  any  value  of  /3 .   Moreover,  /3 .  measures  the  constant 

percentage  change  in  E(y|x)  when  x.  is  increased  by  one  unit  (holding  x  , 

.  .  .  ,  X .  ..  ,  X .  ..  ,  .  .  .  ,  X   constant)  .   For  the  general  model  (3.1),  note  that 
J  -J-    J+i        K 

5E(y|x)/ax^  =  [1  +  Ax^]*^^/^^'^'^j, 
and  therefore 

J    E(y|x)  J 

The  conditional  mean  functions  (3.1)  and  (3.2)  can  be  derived  from  a 
modified  version  of  the  Box-Cox  model  if  P(y  >  0)  =  1.   Recall  that  the 
Box-Cox  conditional  mean  assumption  is  that  for  some  /3  and  A, 

(3.4)  E(y(A)  |x)  =  x/3 ,    A  ^  0 

(3.5)  E(log  y|x)  =  x^,    A  =  0. 
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Rearranging  (3.4)  yields 

(3.6)  E(y'^|x)  =  1  +  Ax;3,    A  =-^  0. 

Suppose  now  that  y|x  is  lognormally  distributed,  so  that 

(3.7)  E(y|x)  =  exp[V(log  y|x)/2  +  E(log  y|x)]. 

2 
Letting  m(x)  =   E(log  y|x)  and  h  (x)  =  V(log  y|x),  (3.7)  can  be  expressed  as 

(3.8)  m(x)  =  log  E(y|x)  -  h^(x)/2. 

Moreover,  w(A)  =  y   also  has  a  lognormal  distribution  for  A  ^  0  with 

2  2 
E(log  w(A)|x)  =  Ani(x)  and  V(log  w(A)|x)  =  A  h  (x)  .   If  it  is  assumed  in 

2       2 
addition  that  h  (x)  =  t      for  all  x  then,  for  A  x  0, 

Am(x)  =  log[E(y^|x)]  -  X^t^/2 

=  logd  +  Axy3)  -  A^r^/2 
or 

(3.9)  m(x)  =  -At^/2  +  (l/A)log(l  +  Ax^) . 

Finally,  the  desired  quantity  /i(x)  =   E(y|x)  can  be  solved  for: 

2 

^(x)  =  exp[T  /2  +  m(x) ] 

=  exp[T^/2   +   log(l  +   Ax/9)    -    A^r^/2] 

(3.10)  =  exp[(l-A)T^/2](l   +  Xx0)^^^ ,        A   ^  0. 

The  Box-Cox  regression  model,  with  the  assumption  of  normality  and 
homoskedasticity  for  log  y,  (rather  than  the  assumption  that  y(A)  is  normally 
distributed  and  homoskedastic)  yields  the  following  functional  form  for 
E(y|x): 

(3.11)  E(y|x)  =  exp[(l-A)T^/2](l  +  Ax/S)^/"^,     A  ^  0 
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(3.12)  =  exp(T^/2  +  x/3)  ,  A  =  0. 


Equations  (3.11)  and  (3.12)  are  of  the  same  form  as  (3.1)  and  (3.2)  up  to 
scale.   However,  it  should  be  stressed  that  the  subsequent  analysis  assumes 
only  that  (3.1)  or  (3.2)  holds;  no  additional  distributional  assumptions  are 
imposed,  except  those  implicit  in  the  underlying  regularity  conditions. 

To  estimate  j3   and  A  by  nonlinear  least  squares  (NLS)  or  weighted  NLS 
(WNLS) ,  the  derivatives  of  the  regression  function  are  needed.   Define  the 
(K+l)xl  parameter  vector  6    =    (/3'  ,A)'  and  express  the  parameterized  regression 
function  for  E(y|x)  as 

(3.13)  M(x;e)  -  [1  +  \^li]^^^ ,  A  ^  0 

=  exp(x^) ,         A  =  0. 
For  A  r^  0  the  gradient  of  n(x.;9)    with  respect  to  P    is  the  IxK  vector 

(3.14)  V  u(x;5)  =  [1  +  Ax^]'^^/^^"^'x 
For  A  =  0, 

(3.15)  V  /i(x;^,0)  =  exp(x^)x. 

Expression  (3.15)  is  also  obtained  by  taking  the  limit  of  (3.14): 

lim  V  m(x;/3,A)  =  exp(x,9)x  =  V   u(x;p,0). 
A-*0   ^  ^ 

Next  consider  the  derivative  of  ^i(x;/3,A)  with  respect  to  A  for  X   ^   0. 

For  1  +  Ac  >  0,  let  q(A)  =  [1  +  Xc]'^^^ .      Then 

log  q(A)  =  (l/A)log  (1  +  Ac) 

so  that 

q'(A)/q(A)  =  [Ac  -  ( 1+Ac ) log( 1+Ac ) ] / [ A^ ( 1+Ac ) ] 


or 


q'  (A)  =  [1  +  Ac]-'-/^[Ac  -  (l+Ac)log(l+Ac)]/[A^(l+Ac) 


Substituting  c   =  x.p   yields 
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(3.16)  V^M(x;y3,A)  =  [l+Ax/3]^/^[Axy3-(l+Ax;3)log(l+Ax/3)]/[A^(l+Ax/3)]  . 
There  are  two  cases  of  particular  interest,  A  =  0  and  A  =  1.   For  A  =  1, 

(3.17)  V^p(x;^,l)  =  x^  -  (1  +  x^)log(l  +  x)9)  =  y.0    -    E(y|x)log  E(y|x). 

The  case  A  =  0  can  be  obtained  by  computing  the  limit  of  q'  (A)  as  A  -•  0 . 

Applying  L' Hospital's  rule  twice,  it  can  be  shown  that 

T     Ac  -  (l+Ac)log(l+Ac)  2 

lim  ^ =  -c  /Z 

A-0        A  (1+Ac) 
so  that 

(3.18)  lira   q' (A)  =  -exp(c)c^/2. 
A^O 

Thus  the  derivative  of  the  regression  function  with  respect  to  A  at  A  =  0  is 

(3.19)  V^/i(x;/3,0)  =  -exp(x;9)  (x^)^/2  . 

Equation  (3.17)  is  the  basis  for  the  U4  statistic  for  the  hypothesis  H  :  A  = 
1,  while  (3.19)  is  the  basis  for  the  LM  test  of  H.:  A  =  0;  both  tests  are 
developed  in  section  4. 

Under  the  assumption  that  (3.1)  or  (3.2)  holds  6   can  be  consistently 
estimated  by  NLS  (under  (3.2),  6   =  fi) .      In  many  cases  the  NLS  estimator  would 
have  a  large  asymptotic  covariance  matrix  due  to  the  substantial 
heteroskedasticity  in  many  nonnegative  economic  variables.   Gains  in 
efficiency  are  possible  by  using  a  weighted  NLS  procedure.   To  this  end,  let 
u;(x;7)  >  0  be  a  weighting  function  that  can  depend  on  an  Mxl  vector  of 
parameters  7.   The  conditional  mean  parameters  /?  and  A  may  be  included  in  7. 
Setting  a)(x;7)  =    1  yields  NLS.   Other  popular  choices  for  w  are 

(3.20)  c^(x;7)  =  [M(x;e)]^    (7  -  6) 

(3.21)  a;(x;7)  =  M(x;e)    (7  -  6) 
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(3.22)  ci)(x;7)  =  exp(x7)  . 

2 
Equation  (3.20)  would  be  appropriate  if  V(ylx)  is  proportional  to  [E(y|x)]  , 

while  (3.21)  is  appropriate  if  V(y|x)  is  proportional  to  E(y|x).   It  must  be 

emphasized  that  in  what  follows  the  weighting  function  is  not  assumed  to  be 

correctly  specified  for  V(y|x).   In  other  words,  it  is  not  assumed  that 

2  M  2 

(3.23)  V(y|x)  =  a   a)(x;7)  for  some  7  e  K   and  some  a      >  0, 

but  only  that  such  considerations  motivate  the  choice  of  w.   The  idea  is  that 
nonconstant  weighting  functions  can  result  in  efficiency  gains  even  if  (3.23) 
does  not  hold.   Because  the  primary  goal  is  to  test  hypotheses  about  the 
conditional  mean  parameters  6  ,    the  inference  should  be  robust  to  violations 
of  (3.23). 

The  robust  asymptotic  variance-covariance  matrix  of  the  WNLS  estimators 
of  p   and  A  can  be  obtained  by  using  the  approach  of  White  (1980).   Let 
{ (x  ,y  ):  t=l , 2 , . . . )  be  a  sequence  of  random  vectors  following  the  regression 
model  (3.1)  or  (3.2),  and  suppose  there  are  N  observations  available.   Assume 
either  that  these  observations  are  independent  or  that  they  constitute  a  time 
series  such  that 

(3.24)  E(yjx^)  =  E(y^|x^,^^_^),    t=l ,  2 ,  .  .  .  , 

where  0    =    (x    y    x    y     ...)  denotes  information  observed  at  time 
t-1  (x   can  contain  lagged  dependent  variables  as  well  as  other  explanatory 
variables).   Equation  (3.24)  ensures  that  the  errors  {e      =  y      -    E(y  |x  ): 
t=l , 2 , . . . )  are  serially  uncorrelated.   It  should  be  kept  in  mind  that  when  x 
contains  lagged  y  inequality  (3.3)  imposes  restrictions  on  the  distribution 
of  y   that  are  generally  difficult  to  characterize.   Also,  time  series 
regressions  for  which  (3.24)  does  not  hold  can  be  accomodated,  but  the 
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formulas  for  the  asymptotic  covariance  matrix  derived  below  (in  particular, 
the  formula  for  B  )  would  have  to  be  modified  along  the  lines  of  White  and 
Domowitz  (1984)  and  Newey  and  West  (1987). 

To  compute  the  WNLS  estimator  of  6  ,  estimates  of  the  weighting  functions 

A  A 

are  needed.   Let  7  denote  an  estimator  such  that  7  would  be  consistent  for  7 
if  (3.23)  held.   In  general,  plim  7=7   where  7   need  not  have  an 
interpetation  as  "true"  parameters  unless  (3.23)  holds.   If  u)   is  chosen  as  in 
(3.20)  or  (3.21)  then  7  would  correspond  to  initial  estimators  of  (p,\),    for 

A 

example  the  NLS  estimators.   If  l>   is  given  by  (3.22)  then  7  can  be  obtained 

"2 

by  nonlinear  least  squares  estimation  using  the  squared  NLS  residual  e      as 

the  regressand. 

A  A 

Define  the  weights  w  s  w(x  17)  (actually  these  are  the  inverse  of  the 
weights) .   Then  the  WNLS  estimator  6    of   6    solves 

N  2  " 

™in  X  (y^  -  Ai(x  ;e))  /w  . 
e   t=l 


£ 


Let  f^^(S)    =  ^i(x^■,e),    w^(7)  =  co(x^;7),  e^^  y^    -    fi^(e)  ,    w^  =  u^(y    ) 

e^//'^^.  and  ^^M   -  ^aM  (.6)/-/^    .       Define  the  (K+l)x(K+l)  matrices 
t    t        at     p  t       t 

\-N"\5^^[Vt'VJ 

B^-N-^j^E[(/;v^M;)'(e;v^;.;)]. 

Then,  under  general  heteroskedasticity  of  unknown  form, 
/N(2  -  6)      ^      N(0,>^^B^S. 

If  (3.23)  holds  then  7  =7  and  the  asymptotic  variance  of  /N(5  -  6)    takes 

2  -1 
the  more  familiar  form  o   A„,  .   In  the  general  case  a  consistent  estimator  of 

N 

V   =  A   B^^   is  given  by  the  White  (1980)  variance-covariance  matrix 
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A         -,    A         A 


■1  -1 

estimator   V^   ^   A^^   B^^    ,    where 


A  -INaAA  -IN 

A       ^    N  y    Vu'Vn    /u>       ^    N'        Y    VTl'VTl 


N    A     A   A        ,  N 

^,V'tVt/-t-^"   \ 

t=l  t=l 

N      A„    A  AAA  ,    N 


I  (.X^^V't^^V'^t)  -  ^''  ^/t^.^'t^^f 


-1 

Bn  -  N  


e,  -  y,  -  M,(^),  V^M,  -  V^p^(^),  7^  .  .y/.^,  and  V^?^  ^  ^^M,//-,.   The 

A 

asymptotic  standard  error  of  ^ .  is  the  square  root  of  the  j th  diagonal 

A 

element  of  V  /N  and,  at  the  risk  of  abusing  the  notion  of  convergence  in 

AAA 

distribution,  one  treats  ^  as  if  6  -  N(6,V  /N).  Carrying  out  inference  on  6 
is  now  straightforward  in  cross  section  contexts  and  in  time  series  contexts 
with  ergodic  processes  and  correctly  specified  dynamics. 

Although  the  WNLS  estimator  is  not  always  asymptotically  equivalent  to 
the  maximum  likelihood  estimator  (which  is  not  defined  in  the  present 

context),  it  is  of  course  the  efficient  WNLS  estimator  if  V(y  |x  )  = 

2 
a   w(x  ;7)  and  7  is  /N-consistent  for  7.   In  addition,  there  are  some 

important  cases  where  the  WNLS  estimator  of  6    achieves  the  asymptotic 

Cramer-Rao  lower  bound.   If  the  conditional  distribution  of  y  given  x  is 

exponential  with  conditional  mean  ^i(x;5)  then  the  weighting  function  in 

(3.20)  produces  a  WNLS  estimator  that  is  asymptotically  equivalent  to  the 

maximum  likelihood  estimator.   Typically  the  initial  estimator  of  6   used  in 

constructing  co      would  be  the  NLS  estimator.   If  y  conditional  on  x  has  a 

Poisson  distribution  with  mean  ^(■x.;9)    then  a;(x;7)  given  by  (3.21)  produces 

the  WNLS  estimator  that  is  asymptotically  efficient.   More  generally,  the  MLE 

of  conditional  mean  parameters  for  densities  in  the  linear  exponential  family 

(LEF)  have  asymptotically  equivalent  WNLS  counterparts.   An  alternative  to 

WNLS  estimation  is  to  simply  maximize  the  appropriate  log- likelihood  function 
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associated  with  the  LEF  density.   The  expression  for  the  standard  errors 

A 

derived  above  is  still  valid  by  letting  to     be  the  estimated  variance  from  the 
distribution.   For  more  on  estimation  and  specification  testing  in  these 
models,  see  Gourieroux,  Monfort,  and  Trognon  (1984)  and  Wooldridge  (1989). 

It  is  fairly  well-known  that  if  w(x;7)  is  misspecified  for  V(y|x)  then 
it  is  possible  to  construct  a  generalized  method  of  moments  (GMM)  estimator 
that  is  more  efficient  than  the  WNLS  estimator.   This  introduces  additional 
complications  into  the  estimation  and  inference  procedures  that  are  beyond 
the  scope  of  this  paper.   If  the  weighting  function  is  chosen  carefully  then 
the  WNLS  estimators  are  likely  to  be  sufficiently  precise  for  many 
applications .   And  if  the  weighting  function  is  approximately  proportional  to 
V(y|x)  then  the  WNLS  estimator  would  typically  have  better  finite  sample 
properties  than  the  GMM  estimator. 

4.  Lagrange  Multiplier  Tests  for  the  Linear  and  Exponential  Models 

Joint  estimation  of  ^  and  X   can  be  difficult  if  the  restrictions 

A       A 

(4.1)  1  +  Ax  ^  >  0,   t=l N 

have  to  be  imposed.   Before  estimating  the  full  model  it  makes  sense  to  test 
whether  some  easily  estimated  restricted  version  is  sufficient.   The  two 
cases  of  primary  interest  are  A  =  0,  which  leads  to  an  exponential  regression 
model,  and  A  =  1,  which  is  the  standard  linear  model  (without  the  assumptions 
of  normality  or  homoskedasticity) . 

Before  developing  these  tests,  it  is  useful  to  briefly  review  the 
general  procedure  for  computing  the  LM  statistic.   First  consider  the  case 
where  ca   is  correctly  specified  for  V(y  |x  ).   Write  the  conditional  mean 
function  as  p  (/3,q)  such  that  the  null  hypothesis  can  be  expressed  as 
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A  A 

where  fi   is  Pxl  and  q  is  Qxl .   Let  yS  be  the  restricted  estimator  of  fi,    let  e 

A  A  A 

=  y   -  /i  {^  ,a    )    be  the  restricted  residuals,  and  let  V  p   s  V  /i  (y3,a  )  and 

A  A 

V  /i   =  V  //  (B,Q^)    be  the  gradients  evaluated  at  the  restricted  estimates.   In 
Q  t    at     0 

the  context  of  WNLS  quantities  denoted  with  a  "~"  are  the  corresponding  "^" 

A  A         A 

variables  weighted  by  l//w   (e.g.  £  =    e  //w  ).   The  usual  LM  statistic  is 

2 
NR   from  the  regression 
u  ° 

(4.2)  I    on  V   7,        V  ;i     t=l,.  .  .  ,N, 

t  p  t     Q  t 

2 

where  R   is  the  uncentered  r- squared.   Under  H_ :  q  =  q_  and  the  assumption 
u  0       0  

2  2    2 

that  V(y  |x  )  =  a  w(x  ;-y)    for  some  7  and  a    ,    NR   from  (4.2)  is  asymptotically 

2 

A  form  of  the  LM  statistic  that  is  robust  to  variance  misspecification 
is  not  much  more  difficult  to  compute,  and  is  originally  due  to  Davidson  and 
MacKinnon  (1985b)  for  the  case  of  unconditional  heteroskedasticity  and 
independent  errors.   The  following  procedure  is  taken  from  Wooldridge  (1990) 
and,  except  that  it  relaxes  the  assumption  of  a  correctly  specified  variance, 
it  is  valid  under  essentially  the  same  regularity  conditions  needed  for  the 
nonrobust  LM  statistic.   First,  compute  e  ,  V„u  ,  and  V  u   as  above.   Next, 

^       t'    /3^t'  Q^t 

save  the  IxQ  vectors  of  residuals,  say  r  ,  from  the  regression 
(^.3)  V  ^    on   V^^  ,   t=l, . . . ,N. 

Q  t  P  t 

(Note  that  r   is  implicitly  weighted  by  l/-/(^    )  .   The  robust  LM  statistic  is 

2 

NR  =  N  -  SSR  from  the  regression 
u  ° 

(4.4)  1   on  7  r  ,   t=l, . . . ,N, 

t  t       '    '  ' 

where  SSR  is  the  sum  of  squared  residuals.   (Note  that  e    r      is  a  IxQ  vector). 


2 

This  LM  statistic  is  asymptotically  Xp,  under  H  whether  or  not  cj   is 

correctly  specified  for  V(y  |x  ),  and  it  is  asymptotically  equivalent  to  the 

nonrobust  LM  test  if  ll>      is  correctly  specified  for  V(y  |x  ). 

Consider  now  testing  the  null  hypothesis  H_ :  A  =  1  in  the  model  (3.1). 

Let  5  =  (1+/3   /S„  ,y3_  ,  .  .  .  ,/3  )  and  assume  that  x   =  1  for  expository  purposes. 
L       Z       J  K.  i 

Under  H   (3.1)  reduces  to  E(y|x)  =  x6 .   Expressions  (3.14)  and  (3.17)  provide 
the  gradient  of  the  regression  function  evaluated  at  the  null  hypothesis: 

(4.5)  V^/i(x;/3,l)  =  (X,  x^  -  (1  +  x^)log(l  +  x^)). 
Let  5  be  the  WLS  estimator  from  the  regression 

(4.6)  y^   on   1,  x^^  >  •••-  ^k'  ^^^ '  ■  ■  ■  '^ 

A 

using  weights  l/Zw  ,  and  let  the  (unweighted)  predicted  values  and  residuals 

A  A       A  A  A         A 

bey  =  X   6,    e      =  y      -   y    ■      The  weighted  residuals  are  e  =   e    /-/w    .      Because 

IN  A       A  A  A  f. 

y  x' e    /cj      =0    and    1   +  x/3   =   x5 ,    the   usual    LM   statistic    is    obtained   as    NR 
_       t    t       t  r  '  ^ 

from  the  regression 

e.   on  x^,  yj.log(y^)   t=l,2,...,N, 


t 


A        A  n  r\        ^ 


~  ~  2         2 

where  x  =  x  //w  and  y  =  y  //w  .   Under  H_  and  V(y  Ix  )  =  a  w  (7) ,  NR  ~ 
t    t'   t     •'t   ^V       t         0       -^t'  t       ^\//.    ^ 

2 
X-.  .      When  u:      =   1    this  reduces  to  the  LM  form  of  the  t-test  for  the  null  of  a 

linear  model  against  the  Box-Cox  alternative  derived  by  Andrews  (1971). 

Because  Andrews  starts  from  the  Box-Cox  model,  his  derivation  of  the 

statistic  is  different.   Andrews  suggests  using  the  standard  t-test  in  the 

Box-Cox  setup  because,  under  H. :  A  =  1,  y|x  is  normally  distributed  and 

homoskedas tic . 

The  failure  of  normality  has  no  effect  on  the  asymptotic  size  of  the 

test,  but  misspecif ication  of  the  conditional  variance  function  can  bias  the 
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inference  toward  or  away  from  the  linear  conditional  expectation.   Because 
the  primary  goal  is  to  test  hypotheses  about  E(y|x)  it  is  prudent  to  use  the 
LM  test  that  is  robust  to  misspecif ication  of  V(y|x).   Let  r   be  the 
residuals  from  the  regression  ^    . 

y^log(y^)   on  x^ ,   t=l,2,...N. 

Run  the  regression 

1   on  I^r^,   t=l,2 N  , 

2 
and  use  N  -  SSR  as  asymptotically  x-<    under  H 

A  A  A 

If  y   is  nonpositive  for  some  t  then  the  indicator  y  log(y  )  is  not 
defined.   This  may  suggest  that  observation  t  is  an  outlier;  an  alternative 
interpretation  is  that  the  hypothesis  A  =  1  is  false. 

As  Davidson  and  MacKinnon  (1985a)  have  emphasized  Andrew's  test  (based 
on  ordinary  least  squares  rather  than  weighted  least  squares)  is  not  optimal 
if  the  standard  Box-Cox  model  holds  (it  ignores  the  changing  shape  of  the 
distribution  of  y|x  as  A  varies).   But  the  optimal  test  is  not  robust  to 
violations  of  the  normality  and  homoskedasticity  assumptions  for  y(A) .   The 
tests  suggested  by  Davidson  and  MacKinnon  (1985a)  are  joint  tests  of  the 
distributional  and  conditional  mean  assumptions  imposed  in  the  Box-Cox  model. 
The  LM  test  for  A  =  1  based  on  a  more  plausible  truncated  normal  distribution 
is  difficult  to  compute  and  not  robust  to  failure  of  the  truncated  normal 
distributional  assumption  (see  Poirier  and  Ruud  (1979,1983)).   Seaks  and 
Layson  (1983)  provide  strong  evidence  that  heteroskedasticity  of  y(A)  in  the 
Box-Cox  model  can  seriously  bias  estimates  and  test  statistics.   On  the  other 
hand,  the  robust  form  of  Andrew's  test  is  easy  to  compute,  maintains  the 
correct  asymptotic  size  under  misspecif ication  of  V(y|x),  and  is  likely  to 
have  sufficient  power  for  many  applications.   Also,  the  robust  form  of  the 
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test  is  asymptotically  equivalent  to  the  usual  LM  test  if  lo     happens  to  be 
correctly  specified  for  V(y  |x  ). 

The  LM  test  for  A  =  0  simply  requires  weighted  nonlinear  least  squares 
estimation  of  an  exponential  regression  function,  which  is  relatively 
straightforward.   Let  p   be  the  WNLS  estimator  of  0   in  the  model 

(4.7)  '         E^yt'^'t^  ^  exp(x^;3) 

A  A 

using  weights  u>    .      Thus,  fi   solves 

N  2  - 

min  X  (y   -  exp(x  i9))  /w 
^   t=l 

A  A  A  A 

Let  y  =   exp(x  P)    be  the  fitted  values,  and  let  e  =  y      -  y  be  the  residuals 
from  the  WNLS  estimation.   Then,  refering  to  (3.19),  the  LM  statistic  is 
based  on  the  scalar 

IN  A  A    /-v  A       A 

(4.8)  I   exp(x  5)(x  /3)  £  /a; 

t=l 

or 

N        A 

(4-9)  I  y^Ciog  y  )  e 

t=l 

2  2 

The  LM  test  for  A  =  0  if  V(y|x)  =  a   cj(x;7)  is  obtained  as  NR   from  the 

regression 

_     _      "2 

e^   on  y^x^,  yj.(log  y^)  ,   t=l ,N. 

The  robust  form  of  the  test  can  be  computed  by  first  saving  the  residuals  r 

from  the  regression 

'2 
y^C^og   y^)    on  y^x^,   t=l N 

and  then  calculating  N  -  SSR  from  the  regression 
1   on   e^r^,    t=l , 2 , . . . ,N; 
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2 
N  -  SSR  is  asymptotically  Xi  under  H„ . 

The  LM  test  for  exclusion  restrictions  is  also  easy  to  derive.  Let  z 
be  a  IxQ  vector  of  additional  variables,  and  consider  testing  H  :  5  =  0  in 
the  model 

(4.10)  E(y^|x^,z^)  =  [1  +  X^^p   +   Az^6]^/^ 

A  A 

(note  that  E(y  |x  ,z  )  =  E(y  |x  )  under  H  ).   Let  /3  and  A  be  the  estimates 
computed  under  5=0,  so  that  the  fitted  values  and  residuals  computed 

A 
A  AA^y-A  A  A 

under  H_  are  y  =    [1  +   X-x.  p]  and  e      =  y   -  y  ,  respectively.   Let  V^   and 

A 

V  /i  be  the  gradients  defined  by  (3.15)  and  (3.17).   The  gradient  with 
respect  to  5  evaluated  under  the  null  is 

If  WNLS  is  used,  each  quantity  is  weighted  by  l//w  and  the  LM  test  is 

2 
obtained  as  NR   from  the  regression 

u  ° 

(^•^^)  ^t  °"  Vf  Vf  Vt^  

the  robust  LM  test  first  requires  the  IxQ  residuals  r   from 

(A.  12)  V^M,   on  V~,^,    V~,^, 

and  using  r   as  in  (4.4).   Exclusion  restriction  tests  when  A  is  fixed  at  A  = 
0  or  A  =  1  are  even  easier  to  compute. 

It  is  important  to  emphasize  that  tests  about  /3  and  A  (and  6    in  (4.10)) 
in  the  current  setup  are  purely  tests  about  E(y|x):   provided  the  robust 
forms  of  the  tests  are  used,  the  null  hypothesis  imposes  no  assumptions  on 
V(y|x)  or  any  other  aspect  of  the  distribution  of  y  given  x.   In  contrast, 
tests  about  j3   and  A  in  the  Box-Cox  model  are  generally  tests  about  the  entire 
distribution  of  y  given  x.   It  is  quite  conceivable  that  one  could  reject  the 
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null  hypothesis  because  the  distribution  is  misspecif ied  even  if  the 
conditional  mean  is  (implicitly)  correctly  specified.   The  efficiency  of  the 
tests  based  on  Box-Cox  type  approaches  comes  at  a  substantial  price.   Not 
only  are  the  tests  nonrobust,  but  it  is  not  possible  to  isolate  the 
hypothesis  that  E(y|x)  is  correctly  specified. 

5.  A  Model  with  Box-Cox  Transformations  of  the  Explanatory  Variables 

The  model  presented  in  section  3  can  easily  incorporate  Box-Cox 
transformations  of  the  explanatory  variables  (which  is  not  qualitatively  the 
same  as  transforming  y) .   Suppose  that  a  Box-Cox  transformation  is  to  be 
applied  to  the  nonnegative  variables  x.,  j=J-i-l ,  .  .  .  ,  K.   Then  (3.1)  can  be 
extended  to 

(5.1)  M(x;^)  ^  [1  +  A(^^x^+...+/3jX^+^^^^x_^^^(p^^^)-....+^j^x^(p^))]^/\ 

X   ^   0 

(5.2)  ^  exp[^^x^4-.  .  .+^_jXj-H^j^^x_j^^(p_j^^)+.  .  .^p^^^ip^)], 

X   =  0. 

Here,  x.(p.)  denotes  the  Box-Cox  transformation  of  x.  with  parameter  p.,    as 
J   J  J  J 

in  (1.1)  and  (1.2).   Equations  (5.1)  and  (5.2)  significantly  expand  the  range 
of  nested  special  cases.   If  A  =  0  and  p.  =  0,  31og  E(y | x)/31og(x. )  =  /3 .  ,  so 
that  p.    is  the  elasticity  of  y  with  respect  to  x.  (as  defined  in  section  2). 
If  A  =  1  and  p.=  0  then  p.    =   3E(y |x)/31og(x. ) ,  so  that  fi.    measures  the  change 
in  the  expected  value  of  y  when  x.  increases  by  one  percent. 

Equation  (5.1)  also  contains  the  CES  production  function  as  a  special 

case.   If  X  =1  and  x„ x  are  nonnegative  inputs  (take  J  =  l) ,  the 

unrestricted  form  of  (5.1)  is 
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The  CES  function  is  obtained  under  the  restriction  \   =   p      =    ...    =   p 

The  exponential  form  (5.2)  has  the  advantage  of  ensuring  that  the 
predicted  values  are  well-defined  and  positive  without  any  restrictions  on 
the  parameters  (except  p.  >  0  if  P(x.  =0)  >  0) .  Such  a  model  is  reasonably 
easy  to  estimate.  Also,  (5.2)  includes  the  constant  elasticity  and  constant 
semi-elasticity  cases  as  restricted  versions.  These  models  have  been  fairly 
popular  in  applied  econometric  studies,  particularly  in  models  of  count  data 
(see  Hausman,  Hall,  and  Griliches  (1984)  and  Papke  (1989)).  The  LM  test  for 
A  =  0  developed  in  section  4  and  the  extensions  discussed  below  might  be 

A  A 

useful  specification  tests  of  the  exponential  model  with  w  =  exp(x  ^) . 

The  derivatives  of  ij.(x;d)    with  respect  to  the  parameters  /3   and  A  are 

similar  to  those  already  obtained,  with  the  exception  that  x^  .,,...,  x,,  are 

^  '  ^  J+1'    '  K 

replaced  by  x  -.{p^    ,),...  ,x  (p  )  .   Let  x(p)  denote  the  IxK  vector 
J  +  i   J-f"l        K   K 


X(p)  ^  (x^,...,Xj,Xj^^(Pj^^),...,Xj^(Pj^)) 


so  that 


fi(x-d)    -  [1  +  Ax(p)^]^/^,      A  ^  0 
M(x;e)  -  exp[x(p)y3]  ,  A  =  0. 

Then,  by  (3.14)  and  (3.16),  for  A  ^  0 

(5.3)  V^m(x;/3,A,p)  =  [1  +  Ax(p)^]'^^/^^'^'x(p) 

(5.4)  V^;2(x;;3,A,p)  = 

(l+Ax(p)/3]  ^  [Ax(p)^-(l+Ax(p)/3)log(l+Ax(p)^)]/[A  (1+Ax(p)^) 
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To  obtain  the  derivative  with  respect  to  p.,  j=J+l,...,K,  note  that  if  z(p) 
denotes  the  Box-Cox  transformation  of  z,  then 

dz{p)/dp   =   z(p)log(2)  -  (z(p)  -  log(z))/p,  p  ^  0 

=  (log  z)V2  p  =  0. 

Thus,  for  A  ^  0,  if  V  x(p)  denotes  the  KxK  gradient  of  x(p)  with  respect  to  p 

P 

=    (p   T,...,p  )  then  the  Ix(K-J)  gradient  of  n   with  respect  to  p  is 

(5.5)  V  m(x;/3,A,p)  =  [1  +  Ax(p)^]'^^/^^"-^'/3'V  x(p)'  . 

P  P 

The  first  J  columns  of  V  x(p)  are  zero  while  the  only  nonzero  element  in  the 
(J+i)th  column  is  in  the  ith  row;  this  term  is  equal  to 

X,  .(Pt  .)log(x,  .)  -  (x,  .(p,  .)  -  log(x,  .))/Pt  - 

or,  if  pj^.  =  0,  (log  Xj_^.)V2,  i=l K-J. 

The  derivatives  for  A  =  0  are  easily  seen  to  be 

(5.6)  V^/i(x;^,0,p)  =  exp[x(p)^]x(p) 

(5.7)  V^/i(x;^,0,p)  =  -exp[x(p)^][x(p)^]^/2. 

(5.8)  V  ;z(x;^,0,p)  =  exp[x(p)^]/?'V  x(p)'  . 

P  P 

The  K+1+(K-J)  parameter  vector  6   =    (/3'  ,A,p')'  can  be  estimated  by  WNLS  as  in 
section  3.   The  asymptotic  variance  and  its  estimator  derived  there  are  still 
valid  once  the  gradient  is  redefined  to  be 

V^Mj.(e)  -  [V^M^(;9,A,p),V^^^(^,A,p),V^;.^(^,A,p)]. 

The  asymptotic  variance  of  any  restricted  version  is  obtained  by  calculating 
the  gradient  of  /i  {9)    with  respect  to  the  unrestricted  elements.   Two  cases 
of  particular  interest  are  A  =  1  and  A  =  0.   In  these  cases  6    contains  only 
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{/3,p)    and  the  gradients  are  given  by  (5.3)  and  (5.5)  (A=l)  or  (5.5)  and  (5.8) 

(A=0). 

Several  models  used  by  applied  economists  are  special  cases  of  (5.1)  or 

(5.2).   The  null  of  a  linear  model  is  stated  as 

Hq:  a  =  1,  p   =  1,  j=J+l K, 

and  the  constrained  derivatives  are 

VoM(x;/0,1,1)  =  x(l)  =  (x^,  .  .  .  ,Xj,Xj^^-l ^k""*"^ 

V^/i(x;^,l,l)  =  x(l)^  -  (1  +  x(l)^)log(l  +  x(l)/3) 

V  /i(x;;0,l,l)  =  /3'V  x(L)' 
P  P 

=  (Vif^j+i^°s^^j+i)  -  ^^j+r^)] ^K^V^s^v  -  (v^>])- 

A 

Suppose  that  x  contains  unity  and  let  5  denote  the  WLS  estimator  from  the 
regression 

A 

y   on  X  ,    t=l , . . . ,N,   using  weights  w    . 

A.  A       A  A  IN  A       A 

Define  y^  =  x5,  €  =  y      -y.   Because  Y  x'  e  /w   =0,  the  LM  test  is  based 
^t    t't-^t-'t  f;tt^t 

on  the  K-J+1  sample  covariances 

N    A  AAA  N 

I  y^log(y^)€^/w^  s  I   y^log(y^)£^ 
t=l  t=l 

N  A    A         N  _ 

y  X  .1oe(x  .)e  /u>     =      Y  X    .log(x  .)£  ,    i=J+l,...,K 
t^i  tJ      tj'  t^  t    ^f^  tj   ^'  tj'  t'    J 

where  quantities  with  a  "-"  are  the  corresponding  "^"  quantities  weighted  by 

2  2 

l/-/u>  If  V(y  |x  )  =  a  w  (7)  under  H   then  the  LM  test  is  simply  NR   from 

the  regression 

e^   on  X  ,  y  log(y  ),  x   ,  ^log(x   ,  ,) x  ,,log(x  „)  ; 

t        t'  ^t   ^'-^t^'   t,J  +  l   ^^  t,J+l^'     '   tK   ^^  tK^  ' 

2     2 
NR   ~  X,,  T  1  under  H _  .   It  is  probably  best  to  use  the  form  that  is  robust  to 
u    K-J+1        0         f  J 

second  moment  misspecif ication !   Define  the  lx(K-J+l)  vector 
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^  A  A 

£   s  £  //u)  ,  and  x   =  x  //cj  .   Then  regress 

£   on  X 

2 

and  save  the  K-J+1  residuals,  say  r  .   Then  NR  =  N  -  SSR  from  the  regression 

^   t  u 

(5.9)  1   on  7^r^,   t=l ,N 

2  2 

is  asymptotically  x^,        i  under  H^  (which  does  not  impose  V(y|x)  =  a   w(x;7)). 

If  the  linear  model  is  rejected  one  might  test  the  less  restrictive 

hypothesis  H_ :  A  =  1.   This  is  a  one  degree  of  freedom  test  with 

misspecif ication  indicator 

A    A  A    A  A    A 

x^(p)^  -  (1  +  x^(p)^)log(l  +  x^(p)/3), 

A  A 

where  /3  and  p   now   denote    the   estimators   computed  under    the   single   restriction 

A  AAA  AA  A  A 

A   =   l.    Lety      =   1   +  -x.    (p)S  ,    e      =y      -y,x      =  x    (p)  ,    and   V   x      = 
-'t  t  t        -' t        •' t         t  ^^^-J'  p    t 

A  A 

(V   X   ,  - V  X  ,,)  (a  Ix(K-J)  vector).   Then  the  usual  LM  test  involves 


regressing 


e    on  X  ,  V  X  ,  y  loE(y  ) 
t       t'   p  t'  ■' t      ^^^t' 


2     2 

and  using  NR  as  x-i  ■   The  robust  form  uses  N-SSR  from  the  regression 

1   on   e  r  , 
t  t 

where  r   are  the  scalar  residuals  from  the  regression 
y^log(y^)   on  x^,  V^x^. 

Under  A  =  0,  p.  =  0,  j=J+l K,  the  model  reduces  to 

E(y|x)  =  exp[^^x^+..  .+/3jXj+^j^^log(Xj^^)  +  .  ..+^j^log(Xj^)], 
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so  that  0      for  i  >  J  +  1  is  an  elasticity  while  /3   for  j  <  J  measures  the 
percentage  change  in  E(y|x)  when  x   increases  by  one  unit.   It  is  traditional 
to  estimate  these  quantities  from  the  regression       v. 

log(y^)   on  x^^ x^j,    log(>^t ,  J  +  1^  '  •••'  ^"^^""tK^ 

but,  as  pointed  out  in  section  2,  the  two  procedures  need  not  give  the  same 
answer,  even  asymptotically.  Let  p  denote  the  WNLS  estimator  of  0  from  the 
weighted  nonlinear  regression 

y^   on   exp[^^+...+^_jX^_j+^j^^log(x^^j^^)  +  ...+^j^log(x^^)] 

A  A  A 

using  weights  u>    ,  and  let  y   and  €   be  the  (unweighted)  fitted  values  and 
residuals  (x   has  been  set  to  unity) .   Evaluating  the  gradient  of 

A 

A 

/i(x  ■,l3,\,p)    at  (13,0,0)    and  weighting  all  quantities  by  1//^^  leads  to  the 
auxiliary  regression 

7^  on  y^x^(O),  y^dog  y^)^  y^Clog  x^_J^^)^  .  .  .  ,  y^(log  x^^) 


2 


where  x^(0)  -  (1  .x^^  '  •  •  •  '^tJ  '  ^"^^''t .  J+1^  '  '  '  '  '  ^°^^^tK^  ^  '   ^^^  ^   statistic  NR^ 

2  2 

from  this  regression  is  X^^j_^_i   under  H^  if  V(y|x)  =  a   w(x;7).   The  robust 

test  is  obtained  by  defining 

?^  -  (y^dog  y^)^y^(log  >^t.J+l^^'---'^t^^°S  \^)    ). 

regressing  1      on  y  x  (0)  and  saving  the  residuals  r^,  and  using  e^  and  r^  as 

in  (5.9). 

The  test  for  H  :  A  =  0  uses  the  scalar  weighted  indicator  J^   ^ 

A  A    A  A  A 

y  (log  y  ),  where  the  fitted  values  are  now  y^  =   exp[x^(p)^],  /?  and  p    are   n 

A  A 

computed  from  WNLS  of  (5.2),  and  y^  ^  y^/y^^^-      As  usual,  the  residuals  are  e^ 

=  V   -  V   and  7   =  e  //a;  .   Let  r   be  the  residuals  from  the  regression 
■^t-'t       t     t't      ,t 


ow 
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f        on   y  X  ,  y  V  X  , 


~  ~      2 
and  then  use  N  -  SSR  from  1  on  e  r   as  x-i  under  H 

LM  tests  of  other  restrictions  can  be  obtained  by  computing  the 

residuals  and  gradients  under  the  null  hypothesis  and  following  procedures 

analagous  to  those  outlined  above.   The  hypothesis  H  :  \   =   0,    p.    =   1, 

j=J+l, . . . ,K  is  likely  to  be  of  general  interest;  H_ :  X   =   p      =    ...  =  p„  is  of 

interest  in  the  CES  example.   Testing  the  exclusion  restrictions  H  :  5  =  0  in 

the  model 

1/A 


E(y^|x^)  =  [1  +  Xx^(p)p  +   \z^S\ 


where  z   is  IxQ,  is  similar  to  the  case  covered  at  the  end  of  section  4, 
except  that  the  gradient  V  /i   (see  (5.5))  must  be  included  in  (4.11) 
(nonrobust  test)  or  (4.12)  (robust  test).   Note  that  the  variables  z   cannot 
themselves  be  transformed  because  the  transformation  parameters  are  not 
identified  under  5=0. 


6.  On  the  Issue  of  Scale  Invariance 

One  feature  of  the  model  (3.1)  (and  the  more  general  model  (5.1))  might 
make  some  researchers  uncomfortable:   the  t-statistics  for  the  slope 

A  A 

coefficients  ^„,...,^   are  not  invariant  to  the  scaling  of  y  .   This  is  in 
z       K  t 

contrast  to  the  case  of  linear  regression  or  exponential  regression,  where, 
for  example,  it  does  not  matter  for  purposes  of  inference  whether  y  is 
measured  in  hundreds  or  thousands  of  dollars.   In  the  linear  case  the 
coefficients  are  scaled  up  or  down  but  the  t-statistics  are  unchanged.   For 
the  exponential  regression  model  (3.2)  it  is  easy  to  see  that  only  the 
constant  term  p      changes  when  y  is  scaled,  and  the  standard  errors  of  all 
coefficients  are  invariant.   Unfortunately,  this  does  not  carry  over  to  the 
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general  model  (3.1)  when  A  is  estimated  along  with  (i .       Spitzer  (1984)  pointed 

out  the  analogous  feature  for  the  standard  Box-Cox  model. 

Focusing  on  (3.1),  suppose  that  (/3,A)  are  the  NLS  (or  WNLS)  estimates 

using  y   as  the  regressand,  and  let  (/3  ,A  )  be  the  corresponding  estimates 


wh 


en  the  regressand  is  c_y   for  some  c.  >  0 .   In  what  follows  it  is  assumed 
that  X  ^  =  1.   As  shown  in  the  appendix,  if  the  estimates  are  unique  then 
they  must  satisfy 

AAA 

(6.1)  A^  =  A;   /3|  =  (Cq  -  1)A'^  +  Cq;9^;   ^^  =  c^^^  ,  j=2 ,  .  .  .  ,K. 

The  estimate  of  A  is  invariant  to  the  rescaling  of  y  ,  and  the  estimates  of 
the  other  coefficients  change  so  that  the  fitted  values  and  residuals  in  the 
scaled  regression  are  simply  scaled  up  versions  of  the  fitted  values  and 
residuals  in  the  unsealed  regression.   Further,  using  (6.1)  it  follows  that 

A 

.  A    A 

(6.2)  1  +  a\/3"^  =  Cq[1  +  Ax/3]  ; 
plugging  this  into  (3.14)  yields 

A 
1     %  A    A 

(6.3)  V^^^(^^,a"*')  =  Cq   V^p^(^,A). 

Because  the  residuals  are  scaled  up  by  c   and  the  coefficients  are  related  by 
(6.1),  (6.3)  might  lead  one  to  believe  that  t-statistics  of  the  slope 
coefficients  /3„,...,/3   are  scale  invariant.   This  is  indeed  true  if  A  has 
been  fixed  at  an  a  priori  value  (e.g.  A  =  0  or  1/2  or  1)  rather  than 
estimated.  However,  the  gradients  with  respect  to  A  for  the  scaled  and 
unsealed  models  are  related  by 


(6.4)      "^x^^^P    .^  )  =  ^o^A^t^^'^^  ^ 


A'^cJ'^l  +  Ax^/3]^^/^^"^Cq  +  Alog(CQ)cQ(l+Ax^y9)  -  Y 


Although  the  second  term  on  the  right  hand  side  of  (6.4)  has  zero  sample 
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average  (by  the  first  order  condition  for  (y9,A)),  this  term  gets  squared  and 
then  summed  in  the  computation  of  the  standard  errors;  consequently,  the 
standard  errors  of  /3„  ,  .  .  .  ,/9   are  not  simply  scaled  dovm  by  c   when  A  has  been 
estimated  along  with  ^.   Interestingly,  as  shown  in  the  appendix,  the 
lagrange  multiplier  statistic  for  exlusion  of  any  IxQ  vector  z   (see  model 
4.10))  is  invariant  to  the  scaling  of  y  .   Consequently,  the  LM  test  for  the 
null 

H^:  p.    =  0 

(j=2,...,K)  is  scale  invariant,  whereas  the  Wald  test  (based  on  the 
t-statistic)  is  not. 

For  the  standard  Box-Cox  model,  Schesselman  (1971)  argued  that  the 
estimate  and  standard  error  of  A  is  scale  invariant.   This  is  also  the  case 
here;  in  fact,  the  robust  standard  error  is  also  scale  invariant.   Both  of 
these  assertions  are  proven  in  the  appendix. 

One  solution  to  the  scale  invariance  problem  for  the  t-statistics  is  to 
add  an  additional  scaling  parameter  to  the  conditional  mean  model.  In  place 
of  (3.1)  consider  the  model 

(6.5)  E(y|x)  =  t.[l  +  Ax/S]^/-^, 

where  ly   =   exp[E(log  y) ]  is  the  population  geometric  mean  of  y  (this  requires 
P(y  >  0)  =  1) .   Then,  scaling  y  up  or  down  simply  alters  the  scale  parameter 
u;    P   and  A  are  unchanged.   However,  model  (6.5)  cannot  be  directly  estimated 
by  NLS  because,  if  x  contains  unity,  the  parameters  /? ,  A,  and  i/   are  not 
separately  identifiable  from  the  NLS  objective  function.   Nevertheless,  (6.1) 


can  be  easily  operationalized  by  replacing  u   by  its  sample  counterpart  i/  = 

r  1  N 
exp  N   Y   log(y  ) 

L   t=l 


,  and  /3   and  A  can  be  estimated  by  solving 
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(6.6) 


min    i    (y   /u    -     [I    +   X^p]^^^/; 
13,  \      t=l 


each  observation  on  y   is  simply  divided  by  the  sample  geometric  mean  of 

(y  :  t=l,...,N),  and  then  the  model  in  section  3  is  estimated.   The  estimates 

A 

of  P   are    trivially  scale  invariant  because  y  /i/    is  invariant  to  scaling. 
Spitzer  (1984)  recommends  the  same  strategy  for  the  Box-Cox  model.   There  is, 
however,  a  somewhat  subtle  issue  that  needs  to  be  addressed  in  implementing 

A  A  A 

this  procedure.   The  solutions  j3   and  A  to  (6.6)  depend  on  the  estimator  i^ . 

A. 

Although  it  is  tempting  to  ignore  the  randomness  of  v ,    the  estimator  of  the 
asymptotic  variance  of  8    =    (P,\)    should  reflect  this  additional  source  of 
uncertainty  (as  different  samples  of  (y  )  are  obtained,  the  estimator  u 
generally  changes).   In  the  general  WNLS  case,  the  easiest  approach  to  this 

A. 

problem  is  to  view  ^  is  a  "two-step"  estimator  that  solves 

N  A      „    A 


^in  X  (y<-  -  /^(x  ;e,i/))  /w 
5   t=l 


,1/A 


(6.7) 

where 

^l(x,6  ,u)    s  i/[l  +  Ax/S] 

A 

The  appendix  derives  the  asymptotic  variance  of  the  solution  d    of  (6.7)  which 
accounts  for  the  variability  of  u .      Define  the  (K+l)xl  vector 

A 

where  V  u   is  the  same  as  derived  in  section  3  except  that  it  is  now 

A  A  A     A     -    . 

multiplied  by  u ;    also,  note  that  V  /i   =  [1  +  Ax/3]     is  simply  the  fitted 
value  for  the  scaled  regressand  y  .   A  consistent  estimate  of  the  asymptotic 
variance  of  6    is 


(6.8) 


r  N 


-1, 


^p'-S: 


r  N ■ 

y  s'  s 
^,  t  t 
t=i 


^pi-S^ 


r  N 

t=i 
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where  P  =  K+1 ,  s   is  the  lx(P+l)  vector 
t 


^"  ^'t^e^f  -i°g(yt/^)). 


A    A 


,1/A 


e,  -  y,  -  M,(e,-)  -  y,  -  -11  +  Ax^^]    .  V^M,  -  V^M,//-,.  and  ~e  ^   ^    e^//a>^. 
The  estimator  (6.8)  is  also  robust  to  variance  misspecif ication 

A 

(heteroskedasticity  when  to     =  1) .   A  degrees  of  freedom  adjustment  would 
scale  (6.8)  up  by  the  factor  N/(N-P) .   Note  that  in  the  construction  of  s 

A  A  A 

t^log(y  /ly)    is  not  weighted  by  l//w  . 

Generally  speaking,  (6.8)  differs  from  the  usual  robust  covariance 
matrix  estimator 

rN  ^,rN„         .rN 


(6.9) 


^'^eK'^eh   ■'  ^A'^eK'^eh   '  ^,^.^t^^ 


t=i 


t=i 


-  1-1 

t 


-t=i 

however,  as  shown  in  the  appendix,  (6.8)  and  (6.9)  produce  numerically 

A 

identical  estimates  for  se(A) ;  this  is  as  it  should  be  because  the 

A 

theoretical  asymptotic  variance  of  X    is  unaffected  by  the  estimation  of  u. 

A  A 

The  scale  corrected  standard  errors  of  /3.,  ,  .  .  .  ,/3   from  (6.8)  will  generally  be 

J.       K 

different  from  those  obtained  from  (6.9),  reflecting  the  influence  of  the 

A 

variation  in  the  estimator  u. 

Similar  conclusions  can  be  obtained  for  the  general  model  (5.1)  and 

(5.2).   As  in  section  5,  let  x  .,  j=J+l K  denote  strictly  positive 

regressors  that  Box-Cox  transformations  are  to  be  applied  to.   Then  the 

A  A 

t-statistics  of  /3 /3   are  not  invariant  to  the  scaling  of  x   ,  ,  .  .  .  ,x  , 

even  if  y  is  not  scaled.   The  estimates  of  /3„,...,^   are  invariant,  as  are 
the  associated  t-statistics;  fi     and  its  t-statistic  are  essentially  never 
invariant.   It  can  be  shown  that  if  p.  is  fixed  at  a  particular  value,  rather 
than  estimated,  then  t{0.)    is  invariant  to  scaling  of  x..   The  estimates  of 
A  ,  p .,  ,  T p„  are  invariant  to  scaling  v  and/or  x,  ,  ,  .  .  .  ,x„ ,  as  is  the 

J+1        K  b  .      /      j^l  '     Y,' 
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asyptotic  variance  matrix  of  the  subvector  {\,p      -,,...  ,Pw)'  ■      Again,  the 

J  ~t"  J.       K 

appendix  verifies  these  claims. 

Lagrange  multiplier  tests  for  any  exclusion  restrictions  are  again  scale 
invariant,  so  these  can  be  used  as  alternatives  to  testing  the  exclusion  of 
particular  variables  via  t-statistics .   The  appendix  contains  a  proof  of  this 
assertion. 

The  scaled  version  of  (5.1)  becomes 

(6.10)  M(x;e,iy,r;)  =  u[l    +    \:iL{p  ,ri)  p]^^^ ,       A  ^  0 

(6.11)  -  exp[x(p,rj)^]  ,   A  =  0 

where  x(p,r;)  ^  (x^  ,  x^  ,  .  .  .  ,  x^  ,  x^^^  (p^^^  ,  r?^^^) ^j+l  ^^j+l ' ''j+l)  )  ■  ^j^^j'^j^ 

denotes  the  Box-Cox  transformation  of  the  scaled  variable  x./n.,  r? .  s 

J   J    J 

exp[E(log  X.)]  is  the  population  geometric  mean  of  x.,  j=J+l , . . . , .K,  and  d    = 
(P'   ,A,p')'  is  now  a  K+1+(K-J)xl  vector  (define  P  =  K+1+(K-J)).   The  first 

A  A 

Stage  estimators  are  now  u ,    the  sample  geometric  mean  of  ly  )  ,  and  rj .  , 
j=J+l,...,K,  the  sample  geometric  means  of  (x  .),  j=J+l,...,K.   The  WNLS 
estimator  now  solves 

N  .  . 

(6.12)  min  I    (y   -  m(x  ;  ^  ,  i^ ,  r?) )  , 

e   t=l 

which  is  algebraically  the  same  as  first  scaling  y  and  x   , , . . . ,x   by  their 

J  +  1       K 

sample  geometric  means  and  estimating  the  regression  function  as  in  section 

5.   Collecting  the  "nuisance"  parameters  into  the  1+(K-J)xl  vector  n   = 

(i/ ,T]' )'   ,  the  gradient  V  /z  {9  ,-i^)    is  needed  to  compute  the  correct  asymptotic 

TT   t 

covariance  matrix.   But  (for  A  ^  0) , 
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(6.13) 
(6.14) 


V^M^C^tt)  =  [1  +  Ax^(p,r?)^ 


1/A 


V  fiCe.ir)    =  -u[l    +  Ax  (p,r?)/3]'^^/^^"-^'(6  A  )(x   /r?  )^j, 

fy.t  t  J  J   tj  J 

j=J+i K. 


Let  V  fi    {6  ,-n)    denote  the  lx(K-J)+l  row  vector  consisting  of  these  elements, 

TT   t 

Let  C,,  now  denote  the  Px(l+K-J)  matrix 
N 


N 


(6.15) 


C„^N-^J^(V^M,//-^)'(V^M,//-^). 


and  let  s   denote  the  lx[ P+(l+K- J) ]  vector 


A       A 


^"  ^'t^e^f'^^^g^yt/'^)  ■''j+l^°S^^,J+l/''j+l)  •  •  •  •  •  V°S(^tK/''j+l)^ 


1/A 


A       A 


^^t     ^'^t'^   t' 


As  usual,  «j-  ^  Yj-  -  M^(^,'^>';)  =  Y^  -  "^[1  +  Ax^(p,r;)^] 

and  £  =   e   //tj  .   Then  a  consistent  estimator  of  the  asymptotic  variance  of 

the  WNLS  estimator  6    is 


r    N 


(6.16) 


J^^  e^t  e^t 
■t=i 


^pl-S^ 


r  N 

y  s's 

■^   t  t 
t=i 


tipl-Ci,]' 


N 


•t=i 


which  is  the  same  form  as  (6.8)  once  V  n    ,    s  ,  and  C  have  been  appropriately 

modified.   As  with  the  simple  model  (3.1),  scaling  y   and/or  x  .,  j=J+l,...,K 

'-         '-J 

does  not  affect  A,  p  ...,  p      or  their  standard  errors;  this  is  reflected 

in  the  fact  that  the  variance-covariance  matrix  estimator  of  the  subvector 

A     A  A 

(A,p    ,...,p  )  obtained  from  (5.16)  is  identical  to  the  robust  formula  which 

J  "T  1  K 

ignores  the  correction  factor  (see  the  appendix  for  a  proof).   Again,  the 

scale  corrected  t-statistics  of  p    /3  will  generally  be  different  from 

the  robust  t-statistics  which  ignore  the  estimation  of  the  scale  parameters. 

The  formula  for  the  asymptotic  covariance  matrix  estimator  when  A  is 
restricted  to  be  zero  (or  any  other  fixed  value)  is  obtained  by  omitting  i/ 
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m 


from  the  model  and  redefining  the  nuisance  parameters  to  be  n   =   rj .      The 
gradients  V^  C^,"')  and  V  p  (B  ,n)    are  also  appropriately  redefined.   In  this 
case  --at  least  for  cross  section  or  static  time  series  applications  --  one 

A 

ight  treat  r;  as  nonrandom  if  the  x   are  being  treated  as  nonrandom.   The 
usual  robust  formula  would  be  asymptotically  valid  in  this  case. 

7.  Some  Practical  Considerations 

The  model  presented  in  section  5  contains  as  special  cases  many  of  the 
functional  forms  used  by  applied  econometricians  in  studies  involving 
nonnegative  variables.   The  generality  obtained  means  that  more  work  is 
involved  in  selecting  an  acceptable  model.   In  addition,  the  problem  of 
choosing  the  weights  to  compute  the  WNLS  estimators  is  important  for  actual 
implementation.   Although  every  application  has  unique  features,  some  general 
guidelines  can  be  given. 

First,  one  has  to  decide  on  which  restricted  version  of  the  conditional 
mean  function  (5.1)  to  start  with.   For  computational  reasons  this  would 
almost  always  involve  A  =  1  or  A  =  0,  possibly  along  with  other  constraints 
on  the  p..       Because  the  constant  elasticity  and  constant  semi-elasticity 
forms  are  so  appealing  for  nonnegative  variables,  a  good  starting  point  is  A 
=  0  and  p.  =  0,  j=J  +  l,...,K,  or  A  =  0  and  p .  =  1 ,  j  =J+1 ,  .  .  .  , K .   Then  P   can  be 
initially  estimated  by  NLS  of  an  exponential  regression  function,  which  is 
relatively  easy. 

If  y   is  continuously  distributed  on  [0,«)  then  one  sensible  choice  of 

A. 

(J  is  the  the  square  of  the  fitted  values  from  the  NLS  estimation.  This  is 
the  optimal  choice  if  y  given  x  has  an  exponential  distribution  or  if  log  y 
given  X  is  normal  with  constant  variance.   As  these  distributional 
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assumptions  are  leading  cases  for  nonnegative  continuously  distributed 
variables,  this  choice  of  weights  makes  some  sense.   (But  recall  that  the 
analysis  in  sections  3,  4,  and  5  does  not  actually  require  that  the  weighting 
function  be  proportional  to  V(y|x).)   If  y  is  a  count  variable  then  a 
sensible  choice  of  weights  is  simply  the  predicted  values  from  the  NLS 
estimation,  as  this  is  optimal  when  y  given  x  has  a  Poisson  distribution. 
Rather  than  performing  the  WNLS  estimation  in  two  steps,  quasi-maximum 
likelihood  estimation  using  the  Exponential  or  Poisson  distributions  can  be 
implemented  directly.   A  more  flexible  set  of  weights  can  be  obtained  as  the 
fitted  values  from  the  OLS  regression 

(7.1)  ej   on   1.  fi(x^,e),     [fi(x^.e)]^, 

A  A 

where  the  fitted  values  ;i(x  ,8)    and  residuals  e     would  most  likely  come  from 
an  initial  NLS  estimation.   This  contains  as  a  special  case  the  optimal 
weights  for  a  geometric  distribution  (which  is  in  the  linear  exponential 
family),  where,  asymptotically,  the  intercept  in  regression  (7.1)  would  be 
zero  and  the  coefficients  on  n(x    ,8)    and  [/i(x  ,8)]      would  both  be  unity. 

There  is  no  reason  to  restrict  the  weights  to  functions  of  the  fitted 
values;  a  simple  and  fairly  flexible  approach  is  to  exponentiate  the  fitted 
values  from  the  regression 

"2 

log  e         on   1 ,  X  „,  ...,  X  ,, ,  t=l N, 

^   t       '   t2      '   tK     .    .  ■ 

A  A 

where  e      are  the  NLS  residuals.   While  this  choice  of  co  need  not  produce 
t  t  ^ 

consistent  estimates  of  V(y|x)  (up  to  scale),  it  could  still  improve  the 
precision  of  the  WNLS  estimator  relative  to  that  of  the  NLS  estimator. 

It  should  be  emphasized  that  weighted  NLS  need  only  be  considered  as  an 
alternative  to  NLS  if  the  researcher  feels  that  the  variance  of  the  NLS 
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estimator  is  too  large.   As  a  test  of  model  specification,  however,  it  is 

frequently  useful  to  compute  the  NLS  and  a  weighted  NLS  estimator  and  compare 

the  two  via  a  Hausman  test.   Wooldridge  (1990)  covers  robust, 

regression-based  Hausman  tests  that  apply  in  this  context. 

Once  a  set  of  weights  has  been  selected,  the  LM  tests  developed  in 

section  5  can  be  used  to  test  H„ :  X   =  0,    p.    =0,  j=J+l K  or  H  :  A  =  0,  p 

=   1,  j=J+l,...,K.   If  both  of  these  hypotheses  are  rejected  then  the  less 

restrictive  hypothesis  H  :  A  =  0  can  be  tested.   If  model  (5.2)  is  rejected 

entirely  then  one  might  turn  to  the  model  with  A  =  1.   The  hypotheses  H  :  A  = 

1,  p.    =   0,  j=J+l,...,K,  H„:  A  =  1,  p.  =  1,  i=J+l,...,K,  and  H„ :  A  =  1  are  of 
/-J     'J  0  J  0 

particular  interest.   All  of  these  LM  tests  are  invariant  to  the  scaling  of  y 
and/or  x. 

If  all  versions  of  the  models  A  =  0  and  A  =  1  are  rejected  then  perhaps 
the  unrestricted  model  needs  to  be  estimated.   The  issue  of  lack  of  scale 
invariance  of  the  t-statistics  then  becomes  an  issue,  and  the  methods  of 
section  6  can  be  used.   Of  course  nothing  guarantees  that  the  general  model 
(5.1)  is  correctly  specified  for  E(y|x);  the  Hausman  test  can  be  applied  to 
test  for  misspecif ication  of  the  general  model. 

Another  problem  arises  if  the  data  cannot  reject  either  A  =  0  or  A  =  1. 
In  this  case  one  might  turn  to  a  nonnested  hypotheses  test  to  attempt  to 
distinguish  between  the  two  models.   Computationally  simple  robust  nonnested 
hypotheses  tests  are  discussed  in  Wooldridge  (1990).   A  simple  goodness  of 
fit  criterion  is  to  choose  the  model  with  the  smallest  SSR  from  NLS 
estimation  or  from  WNLS  estimation  using  the  same  set  of  weights. 
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8.  Concluding  Remarks 

The  Box-Cox  transformation  has  proven  to  be  a  useful  tool  for 
generalizing  functional  form  in  statistics  and  econometrics.   It  is  not, 
however,  well-suited  for  applications  where  interest  centers  on  E(y|x)  rather 
than  on  the  conditional  expectation  of  some  nonlinear  transformation  of  y. 
When  y  is  the  quantity  of  interest  to  economic  agents  and  policy  makers  it  is 
important  to  have  available  estimates  of  E(y|x)  that  are  easy  to  compute  and 

robust  to  distributional  misspecif ication.   Estimating  a  linear  model  where 

1/2 
the  regressand  is,  say,  y    ,  is  not  very  useful  unless  E(y|x)  can  be 

1/2 
recovered  from  E(y    |x).   In  the  Box-Cox  framework  with  \   =  1/2  computation 

1/2 
of  E(y|x)  requires  normality  and  homoskedasticity  of  y    . 

Requiring  that  some  power  transformation  simultaneously  induce  linearity 

of  the  conditional  expectation,  homoskedasticity,  and  normality  is  asking  a 

lot  of  economic  data,  and  is  not  in  itself  important  for  estimating  economic 

quantities.   This  paper  has  proposed  as  an  alternative  estimating  a  nonlinear 

model  for  E(y|x)  that  is  flexible  enough  to  contain  several  special  cases 

that  are  used  frequently  by  applied  researchers.   Further,  no  second  moment 

or  other  distributional  assumptions  are  relied  upon  to  obtain  consistent 

estimates  or  to  perform  asymptotically  valid  inference.   As  a  consequence  all 

of  the  robust  LM  tests  of  the  special  cases  covered  in  sections  3  and  5  are 

pure  conditional  mean  tests:   a  rejection  can  be  confidently  interpreted  as  a 

rejection  of  the  model  for  E(y|x),  and  not  as  a  rejection  of  some  other  less 

important  distributional  assumption. 
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Mathematical  Appendix 

The  results  in  this  appendix  require  that  the  vector  of  explanatory- 
variables  contains  a  constant.   Thus,  x  ..  =  1 ,  t=l ,  2 is  assumed 

throughout.   For  notational  simplicity,  the  results  are  proven  for  the 
unweighted  case;  it  is  obvious  how  the  proofs  are  modified  for  weighted  NLS . 
Claims  1-4  pertain  to  the  model  of  section  3. 

Claim  1:   A+  =  A ;  ^^  =  (c^  -  1)A"^  +  c^^^;  ^J  =  c^^^ ,  j=2,...,K. 

Proof:   Let  /i  (/3,A)  =    [I   +   Xx  p]         ,    and  let  V  /j(/3,A)  and  V  /i(^,A)  denote  the 

derivatives.   Then  the  first  order  conditions  for  (/3    ,\    )    are 

N 


I   V^;.(/3  ,A  )'  (c^y^  -  M^(^  ,A  ))  -  0,  (a.l) 

N 

I   V  M(/,A^)'(c^y^  -  p  f/9^,A^))  =  0.  (a. 2) 

t=l  ^  U  t    t 

Because  the  solutions  are  assumed  to  be  unique,  it  suffices  to  show  that  ^ 

and  A   given  by  (6.1)  satisfy  (a.l)  and  (a. 2).   Then  (a.l)  reduces  to  showing 
N     "     ^  . 
J^cJ-\^^(/3,A)'(cQy^  -  Vt^^-^^) 

"  N 

rt    *    i^  A    A  A    A 

=  c"  lVuip,X)'(y.^,(p,X))=Q.  (a. 3) 

t=l  '^ 

A    A 

But  (a. 3)  follows  from  the  first  order  condition  for  (^,A).   Next,  from 
(6.4), 

^      +  +  +  +     N 

I   V^m(/3  ,A  )'(cQy^  -  fi^ip    .X    ))    =1    {c^V^f,^(p,X)    + 
t=l  t=l 


;-2  i-A, 


'^n/Al-lA^  A         AA  AA 

l+Ax^/3]^  /  ^  ^[cQ+Alog(cQ)cQ(l+Ax^^)-l])(cQy^-CQM^(^,A)) 
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t=l 


(a. 4) 


N 


t=l 
The  first  term  in  (a. 4)  is  zero  by  the  first  order  condition  for  (P,X).      The 

A    A 

first  order  condition  for  (y3,A)  also  implies  that  the  second  part  of  (a. 4)  is 
zero.   This  is  because 


A       A 


N 

X  [1  +  Axi]'^^/^)-^*(y   -  u  (^.A))  ^  0 
t=l 

IN  A    A  A    A 

is  the  first  element  of   ^  ^oA'  (/3,A)'(y   -  ^  (/3,A))  if  x    =1.   Also, 

N 

I    [l   +   Ax  fi]'^^/-"^  ^'(1  +  Ax  i3)(y   -  M  (/3,A)) 

t=l 


(a. 5) 


N 


A       A 


=   I  [1  +  Ax  /3]'^^/^)-^'(y   -  u(P,X)) 
t=l 


+  A  X  [1  +  Ax^^]'^^/^^'^'x^^(y^  -  M^(^,A))  =  0 


t=l 


N 


since  this  is  a  linear  combination  of  Y  '^ of^    (^,A)'(y   -  fi    (^,A)).   This 

t=l  ^  ^        t    t 

establishes  (a.l)  and  (a. 2)  for  (l3    ,A  )  given  by  (6.4),  and  completes  the 
proof.   * 


Claim  2:   se(A)  is  scale  invariant. 


Proof:   It  is  shown  that  the  asymptotic  variance  of  A  is  scale  invariant.   A 
standard  mean-value  expansion  yields 


/N(e  -  8) 


N"^  I  ^g^^(erv^^,^{e)]'^N'^^\  ^g^^(S)'^^  +   o(i)-   (a. 6) 


t=l  "   ^         "   "      ^  t=l 

Focusing  on  the  last  element  of  (a. 6),  that  corresponding  to  A,  we  have 
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/N(A  -  A) 


N"^  y  r^V^N'^/^X  re      +  o  (1) 
t=l   -'       t=l        ^ 


where  r   is  the  residual  from  the  regression 
t  ^ 


V^Mj.(^)   on  V   ii^(e),         t=l, N. 


(a. 7) 


(a. 8) 


Th 


e  true  error  e   is  scaled  up  by  c   when  y   is  scaled  up  by  c    so  it 
suffices  to  show  that  r   is  also  scaled  up  by  c     Then  the  expression  (a. 7) 
is  scale  invariant,  and  then  so  must  be  the  asymptotic  variance  of  A.   Let 
values  superscripted  by  "o"  denote  the  scaled  values.   Then 


Vt^'°)  =  -J"  Vt^'^ 


(a. 9) 


A'^cJ'^l  +  Ax^^]^^/^^'^[Cq  +  Alog(cQ)CQ(l+Ax^^)  -  l] 
,o 


(a. 10) 


From  (a. 10),  V  u  (ff    )  can  be  expressed  succinctly  as  V  ix  (6    )    =   '^^y^^^    (^) 
V  fi    (6)a    for  a  Kxl  vector  a.   From  (a. 9), 

Consequently,  the  residuals  from  the  regression 


V^^^(e°)   on  V^n^(e°), 


o      .  _    o 

say  r  ,  satisfy  r   =  c_r  ;  thus, 
^   t        -^   t    0  t 


1  N     ^■ 


IN 


21 


t=l 


■1m -1/2 


N 


t=l 


t=l  ^  ^ 


"N  "' ^  y  r  e  ,  and  the  asymptotic  variance  of  A  is  invariant. 

til  ^  ^ 


That  the  computed  standard  errors  are  invariant  follows  because  r   and  €   are 
^  t      t 

also  scaled  up  by  c..   * 

Claim  2-   The  LM  statistic  for  exclusion  restrictions  is  scale  invariant. 


1/A 


Proof:   Consider  the  unrestricted  model 

/i(x,z  ;/3,  A,5)  =  [1  +  Ax/3  +  Az5 

and  consider  testing  H„ :  6  =  0 .   If  the  regressand  is  c^y   for  c.  >  0  then  the 

'^0  ^  O-'t      0 


45 


gradients  used  for  the  test  on  scaled  data  are  related  by 

A 
1     \  A    A 

A    A 

A   Cq  ^[1  +  \K^I3]^^^^'    ^[Cq  +  Alog(cQ)cJ(l+Ax^^)  -  1] 

A    A  AAA 

A 
-^     ■.  A    A 

V^M^(y9'^,A"',0)  =  cj'  V^^^(/3,A,0). 
Because  the  gradients  of  the  scaled  data  are  linear  combinations  of  the 

A 

gradients  for  the  unsealed  data,  and  because  e      =  c  e  ,  the  r-squareds  from 
the  regressions  ,    .      ,   , 

e;^   on  V^M^(/,A^,0),  V^;i^(/ ,  A^,  0)  ,  V^m^(^^,  A^,  0) 
and  . 

A  A    A  A    A  A    A 

£^   on  V^^i^(p,X,0),    V^M^(/3,A,0),  V^m^(^,A,0) 

are  numerically  identical.   This  shows  that  the  nonrobust  LM  statistics  are 
numerically  identical.   For  the  robust  test,  note  that  the  residuals  r   from 
the  regression 

V^;.^(/,A+,0)   on  V^M^(/,A^,0),  V^^^(/,A^,0) 

A 

are  scalar  multiples  of  the  residuals  r   from  the  regression 

A    A  A    A  A    A 

V^;i^(^,A,0)   on  V^M^(/3,A,0),  V^/i^(/3  ,  A  ,  0) 

(r   =  c    r  ).   Consequently,  the  r-squareds  from  the  regressions 

1   on   e  r 

t  t 

and 

1   on   €  r 

t  t 

are  identical.   " 


46 


Claim  4:   The  scale  corrected  standard  error  of  A  is  identical  to  the  robust 
standard  error  of  A  (y  has  been  scaled  by  the  geometric  mean  i/)  . 


Proof:  9   =    (fi,\)    now  satisfies 

N 


wh 


t=l  ^  ^        t    t 
,1/A 


ere  n    {6  ,u)    =  u[l   +   \-x.  j3]  and  V  /i  {6,1/)    and  V  u  (9  ,1/)    are  also  scaled  up 

by  I/.   A  mean  value  expansion  along  with  the  delta  method  yields 


A 

r    1  N 

/N(^ 

-    9)    = 

^         t=l 

t=l  ^  ^  ^ 


(a. 11) 


1    N  .        .  N  >  N 

N        y  Vn'Vn  N        y   V„^l'V  M      N      ■^      y  J/logCy   /i/)    +   o    (1) 


N         >   .    N         ^1 
N    y  V„n'V„n  N   y  V^/i'V  fj. 


(a. 12) 


N         ^  ,  ^   ,  N         >   ,  ,„  N 

I 
t=l  "  ^  "  ^J   I.   t=l  "  '-  '  ^J      t=l 

where  all  elements  are  evaluated  at  {6,1/).      The  second  term 

in  (a. 11)  is  the  contribution  due  to  the  estimation  of  u;    the  first  term  is 

as  before.   Thus,  it  suffices  to  show  that  last  element  of 

N  ^  ,  r   ,  N 

I 

t=l  "  "  "  ">'  ^  t=l 

(the  element  corresponding  to  A)  is  identically  zero.   But  (a. 12)  is  the 
vector  of  coefficients  from  the  regression 

Vt   °"  ^9^^'       "=^ N 

or 

V  /i    on  V    n     ,  V    n     ,   t=l N. 

The  coefficient  on  V  u   is  also  obtained  by  first  obtaining  the  residuals  r 
from 

Vt  ^'^  Vt'    ^=i-----N. 

and  then  computing  the  coefficient  from  the  simple  regression 
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V  u    on   r  .      . 
u^t  t 

Thus,  it  suffices  to  show  that  V  /i   and  r   are  orthogonal.   But 

the  residuals  r   are  orthogonal  to  any  linear  combination  of  V  p  ,  i.e. 

N 

y  V„^l'  r  =  0, 

and  V  /i   =  [1  +  Ax  p]'^^'^   =  [  1  +  Ax  P]'^^^''^   +  [  1  +  Ax  fi]'^^^''^yi  P,    which  is  a 
linear  combination  of  V„u  whenever  x   contains  a  constant.   Thus,  for  A, 
(a. 11)  is  the  same  as  (a. 6).   This  completes  the  proof.   * 


Claims  5-9   pertain  to  the  general  model  (5.1). 

Claim  5:   Consider  the  model 

/i(x;^,A,p)  =  [1  +  Ax(p)^]^/-^, 

where  x(p)  =    (1 ,x   . . . ,x  ,x    (p    ) , . . . ,x  (p  ) ) .   Suppose  the  scaled  data 

are  c.y  ,c,    ^x      ,  .,,...,  c,,x  ,,  where  c.  >  0.   Then  the  relationships 
0-^t'  J+1  t,J-i-l'    '  K  tK        J  ^ 

between  the  estimators  using  scaled  data  and  unsealed  data  are 

A  A 

a"^  =  A;   p|  =  p    j=J+l ,K; 

A  A  A  A 

pI  -  (cj  -  DA-'  +  4i}^  +  ^j+i^jii^^'i-  - 1)  -  Vk'^^7^  -  i)'  = 


A       A 


-A  -A 

^t  =  c^p.,   2=2 J;  P^.   =  4<^'/iPy   j=J+l,...,K. 


Proof:   This  is  similar  to  the  proof  of  Claim  1.   The  first  order  condition 

for  8      =  (/3  ,A  ,p  )  is  given  by 

N 

I   V^pCx"^;/)'  (c^y^  -  M(x;^;/))  -  0;  (a. 13) 

here,  x   denotes  the  scaled  repressors  x  C,  where  C  = 
t  ^         t 

diag(l , . . . , 1 , c     ...,c  ).   Showing  that  the  above  choice  of  8      solves  (a. 13) 
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relies  on  the  following  relationships: 

V^fii^'^-J^)    =   cj"\^/i(x^;e)  (a. 14) 

A 

V^M(x;^;e^)    =   CqV^mCx^;^)    +  (a. 15) 


A  A 


X    ^cj    ^[1   +   Ax^(p);9]^^/^^    ^[Cq   +   Alog(cQ)cQ(l   +   Ax^(p)/3)    -    1] 

A  A  A 

V   fi(^l,e^)    =   c   V   M(x^,e)    +   c    log(c    )^   V^(x      e)  (a. 16) 

p        t  Opt  U  J      J    p.      t 

J  J  '^  J 

AAA  A  A 

+   c^[{c.^2/P^    -    VPj)/pj    -    log(c^)/p^]V^;i(x^,e) 

j=J+l K. 

Equations  (a. 14),  (a. 15),  and  (a. 15)  show  that  V   n    (x  ,6    )  is  a  linear 

u   t    t 

A  A 

combination  of  V  /i(x  ,6).      Because  e      =   c^e    ,  it  follows  that 

at.  tut 

N 

I    V    n(y,':,e^)'e^    =  0, 
t=l  ^     ^        "" 


which  establishes  (a. 13). 


Claim  6:   The  estimators  fi      .  .  .  ,j3     and  the  associated  standard  errors  are 


invariant  to  scaline  of  only  x,  ^ x,, . 

^       ^      J+1      K 

+ 
Proof:   In  this  case,  c   =  1,  so  that  fi.    =  p.,    j=2,...,J  follows  from  Claim 

A  A  A 

5.      Also,    e^  =    £^,    V   u(x^;/)    =  ^i(yi^■,e),    V^mCx^;^^)    =  V^;i(x^;5),    and 

V   uCx^.^"^)    =  V   u(x^,e)    +   log(c    )^  V  M(x^.^) 
J  J  J      J    /^j 

A 

AAA  A  A 

+    [lc:^j/p      -    l/p    )/p      -    log(c    )/p    ]V   m(x    ,^), 
J  J  J  J  J  J       p-\        "- 

j=J  +  l K. 

As  in  the  proof  of  Claim  2,  the  standard  error  of,  say,  ^-  =  ^.  depends  only 
on  e   and  the  residuals  r   from  the  regression 

Vn  on   V^u  ,  S/     n Vn     ,  V  u  ,  V  U  ,    t=l N. 

^^t       13'^t'      p'^t'  p'^t'       X^t'       p^t' 

Because  of  the  above  relationships,  these  residuals  are  independent  of  the 
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scale  variables  c^  ^  ,  .  .  .  ,  c,, 
J  +  1      K 


A  similar  argument  in  fact  shows  that  the 

asymptotic  variance  associated  of  the  subvector  (^_,...,^  )  is  invariant  with 

respect  to  the  scale  coefficients  c     ...,c  .   ■ 

J  + 1        K. 


Claim  ±\      The  asymptotic  variance  of  (-^  ,  P  t. -j  .  •  •  -  ,  Pjr)  is  independent  of 


c       c  c 

0'  J+1'  ■  ■  •  '"-K- 


Proof:   As  in  the  proof  of  Claim  2,  a  mean  value  expansion  shows  that 


/N[(A,p)  -  (A,p)]'  =  n'^  I   r°'r° 

*-   t=l     ^ 


1    1/9  N 

t=l  ^   ^    P 


where  the  lx(l+K-J)  vectors  r   are  the  residuals  from  the  regression 

Vt'  Vt  °"  Vt'    ^=i'----N- 


(a. 17) 


The  gradients  in  (a. 17)  are  evaluated  at  the  scaled  x   and  the  scaled  true 

coefficients  B    .       Because  e   =  c„e  ,  it  suffices  to  show  that  r   =  c„r  , 

t     0  t'  t     0  t' 

where  r   are  the  residuals  from 

t 

Vf  Vt    °''   Vt'    ^=i'---'N;  (a. 18) 

the  gradients  in  (a. 18)  are  evaluated  at  the  unsealed  quantities  x   and  6. 
But  the  population  analogs  of  (a. 14) - (a . 16)  are 

Vt     0    /t 

^A^t  ^  "o^A^^t  ^  Vt^ 

Vt  =  '^oVt "  ^j Vt  ^  ^j Vt'  ^^'"'' •  •  •  •^' 

J         J         J         1 

for  constants  d.  and  f .;  it  immediately  follows  that  r   =  c_r  ,  and  this 
J       J  ^  t     0  t' 

^    A 

shows  that  the  asymptotic  variance  of  (A,p)  is  scale  invariant.   * 


Claim  8;   LM  tests  for  exclusion  restrictions  are  independent  of 
^O'^J+1' ^K' 
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Proof:   The  unrestricted  model  is 


/j(x,z;y9,  A,p,5)  =  [1  +  Ax(p)/3  +  Xz8] 


l/X 


and  the  null  hypothesis  is  H  :  5=0.   The  relationships  between  the 
restricted  gradients  for  the  scaled  and  unsealed  data  are  given  by 
(a. 14) - (a. 16)  for  the  parameters  /3 ,  A,  and  p.      The  gradient  for  8    evaluated 
at  5  =  0  is  easily  seen  to  satisfy 

AAA 

Thus,  the  r- squared  from  the  regression 

t         pt  At  pt  ot 

is  independent  of  c      c         c    A  similar  argument  establishes  that  the 

robust  LM  test  is  also  invariant.   ' 

A.  >^ 

Claim  9:   In  the  model  (6.10)  with  intial  scale  estimates  i^ ,  rj ,  (6.16)  is  a 

A. 

consistent  estimator  of  the  asymptotic  variance  of  S.      Moreover,  the 

A    A. 

submatrix  corresponding  to  (A,p)  is  unaffected  by  the  asymptotic  variance  of 

A    A 

Proof:   As  in  the  proof  of  Claim  4,  a  standard  mean  value  expansion  shows 
that 


/N(e  -  0) 


N 

I 

t=l 


"■^  ^, '«"£'«"£ 


t  =  l 


r  -1  ^  1  If  1  ^  11/2^ 

N    I  Vn'Vn  N    y  V  u'V  ^x      N   '^   y  g' 


N 

I 

t=l 


+   o„(l). 


t=l 


(a. 19) 


where  6    =    (P'   ,A,p'  )'  ,    n   =    {u  ,t]'  )'  ,    and  g   is  the  lx(l+K-J)  vector 
g^  -  [i.log(y^A),,_j^^log(x^_j^^Aj^^),...,r?^log(x^j^/,j^)] 
(Note  that  E(g  )  =0.)   Thus, 
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/N(e  -  ^)  =  A^  [l^\-C^] 


where 


^,  e^t  t'  ^T  t 

t=i  t=i 


+   Op(l) 


-1 


\  -  N-  J^E[V,M',V^M,: 


N 

I 
t=l 


S^^"'^/iV'tVt]- 


This  is  written  concisely  as 


N 


/N(5  -  6)    =  A^^Ip|-C^]N'^/2  X  s;  +  Op(l) 


where  s   is  the  lx(P+l+K-J)  vector  s   ^  (V  /u  e  ,g  ).   Thus,  the  asymptotic 
t  t     p  t  t   t 


variance  of  /N(^  -  9)    is 


\'t^pi-s: 


N-'lE(s;s^)][Ip|-C^]'A^^ 


t=l 


Replacing  unknown  expecations  by  their  sample  counterparts,  and  the  unknown 
parameters  by  the  estimates  (^,7r),  yields  (6.15)  multiplied  by  N;  the 

A  A 

asymptotic  variance  of  6    is  obtained  by  dividing  AV[/N(6  -  6)]    by  N.   This 
completes  the  first  part  of  the  assertion. 

To  establish  the  second  part,  we  show  that  the  elements  in  the  last 
1+(K-J)  rows  of 


N 


■1 


r    N 


t=l  -'         '-t=l 


(a. 20) 


(those  corresponding  to  (A,p))  are  identically  zero.   Let  R  be  the  (l+K-J)xK 
matrix  of  residuals  from  the  matrix  regression 

then  it  suffices  to  show  that 
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N 

y  R'V  /i  =  0.  (a. 21) 

t=l 

But  (a.  21)  holds  if  V  /i   is  a  linear  combination  of  V  /j  ,  and  this  is  seen  to 

TT  t  P  t 

be  the  case  from  (6.14)  provided  that  x   contains  a  constant.   The  sample 
counterpart  of  this  argument  shows  that  (6.16)  is  numerically  identical  to 
the  robust  variance  estimate  for  the  subvector  (A,p).   * 
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